word_embedding_loader package¶
Submodules¶
word_embedding_loader.cli module¶
word_embedding_loader.exceptions module¶
-
exception
word_embedding_loader.exceptions.
ParseError
¶ Bases:
exceptions.Exception
-
exception
word_embedding_loader.exceptions.
ParseWarning
¶ Bases:
exceptions.Warning
-
word_embedding_loader.exceptions.
parse_warn
(message)¶
word_embedding_loader.word_embedding module¶
-
class
word_embedding_loader.word_embedding.
WordEmbedding
(vectors, vocab, freqs=None)¶ Bases:
object
Main API for loading and saving of pretrained word embedding files.
Note
You do not need to call initializer directly in normal usage. Instead you should call
load()
.Parameters: - vectors (numpy.ndarray) – Word embedding representation vectors
- vocab (dict) – Mapping from words (bytes) to vector indices (int).
- freqs (dict) – Mapping from words (bytes) to word frequency counts (int).
-
vectors
¶ numpy.ndarray – Word embedding vectors in shape of
(vocabulary size, feature dimension)
.
-
vocab
¶ dict – Mapping from words (bytes) to vector indices (int)
-
freqs
¶ dict or None – Mapping from words (bytes) to frequency counts (int).
-
classmethod
load
(path, vocab=None, dtype=<type 'numpy.float32'>, max_vocab=None, format=None, binary=False)¶ Load pretrained word embedding from a file.
Parameters: - path (str) – Path of file to load.
- vocab (str or None) – Path to vocabulary file created by word2vec
with
-save-vocab <file>
option. If vocab is given,vectors
andvocab
is ordered in descending order of frequency. - dtype (numpy.dtype) – Element data type to use for the array.
- max_vocab (int) – Number of vocabulary to read.
- format (str or None) – Format of the file.
'word2vec'
for file that was implemented in word2vec, by Mikolov et al..'glove'
for file that was implemented in GloVe, Global Vectors for Word Representation, by Jeffrey Pennington, Richard Socher, Christopher D. Manning from Stanford NLP group. IfNone
is given, the format is guessed from the content. - binary (bool) –
Load file as binary file as in word embedding file created by word2vec with
-binary 1
option. Ifformat
is'glove'
orNone
, this argument is simply ignored
Returns:
-
save
(path, format, binary=False, use_load_condition=False)¶ Save object as word embedding file. For most arguments, you should refer to
load()
.Parameters: use_load_condition (bool) – If True, options from load()
is used.Raises: ValueError
–use_load_condition == True
but the object is not initialized viaload()
.
-
size
¶ Feature dimension of the loaded vector.
Returns: int
-
word_embedding_loader.word_embedding.
classify_format
(f)¶ Determine the format of word embedding file by their content. This operation only looks at the first two lines and does not check the sanity of input file.
Parameters: f (Filelike) – Returns: class