word_embedding_loader.loader package¶
loader module provides actual implementation of the file loaders.
Warning
This is an internal implementation. API may change without
notice in the future, so you should use
word_embedding_loader.word_embedding.WordEmbedding
Submodules¶
word_embedding_loader.loader.glove module¶
Low level API for loading of word embedding file that was implemented in GloVe, Global Vectors for Word Representation, by Jeffrey Pennington, Richard Socher, Christopher D. Manning from Stanford NLP group.
-
word_embedding_loader.loader.glove.
check_valid
(line0, line1)¶ Check if a file is valid Glove format.
Parameters: - line0 (bytes) – First line of the file
- line1 (bytes) – Second line of the file
Returns: True
if it is valid.False
if it is invalid.Return type: boo
-
word_embedding_loader.loader.glove.
load
(fin, dtype=<type 'numpy.float32'>, max_vocab=None)¶ Load word embedding file.
Parameters: - fin (File) – File object to read. File should be open for reading ascii.
- dtype (numpy.dtype) – Element data type to use for the array.
- max_vocab (int) – Number of vocabulary to read.
Returns: Word embedding representation vectors dict: Mapping from words to vector indices.
Return type: numpy.ndarray
-
word_embedding_loader.loader.glove.
load_with_vocab
(fin, vocab, dtype=<type 'numpy.float32'>)¶ Load word embedding file with predefined vocabulary
Parameters: - fin (File) – File object to read. File should be open for reading ascii.
- vocab (dict) – Mapping from words (
bytes
) to vector indices (int
). - dtype (numpy.dtype) – Element data type to use for the array.
Returns: Word embedding representation vectors
Return type: numpy.ndarray
word_embedding_loader.loader.vocab module¶
-
word_embedding_loader.loader.vocab.
load_vocab
(fin)¶ Load vocabulary from vocab file created by word2vec with
-save-vocab <file>
option.Parameters: - fin (File) – File-like object to read from.
- encoding (bytes) – Encoding of the input file as defined in
codecs
module of Python standard library. - errors (bytes) – Set the error handling scheme. The default error
handler is ‘strict’ meaning that encoding errors raise ValueError.
Refer to
codecs
module for more information.
Returns: - Mapping from a word (
bytes
) to the number of appearance in the original text (
int
). Order are preserved from the original vocab file.
Return type: OrderedDict
word_embedding_loader.loader.word2vec_bin module¶
Low level API for loading of word embedding file that was implemented in
word2vec, by Mikolov.
This implementation is for word embedding file created with -binary 1
option.
-
word_embedding_loader.loader.word2vec_bin.
check_valid
()¶ Check
word_embedding_loader.loader.glove.check_valid()
for the API.
-
word_embedding_loader.loader.word2vec_bin.
load
()¶ Refer to
word_embedding_loader.loader.glove.load()
for the API.
-
word_embedding_loader.loader.word2vec_bin.
load_with_vocab
()¶ Refer to
word_embedding_loader.loader.glove.load_with_vocab()
for the API.
word_embedding_loader.loader.word2vec_text module¶
Low level API for loading of word embedding file that was implemented in
word2vec, by Mikolov.
This implementation is for word embedding file created with -binary 0
option (the default).
-
word_embedding_loader.loader.word2vec_text.
check_valid
(line0, line1)¶ Check
word_embedding_loader.loader.glove.check_valid()
for the API.
-
word_embedding_loader.loader.word2vec_text.
load
(fin, dtype=<type 'numpy.float32'>, max_vocab=None)¶ Refer to
word_embedding_loader.loader.glove.load()
for the API.
-
word_embedding_loader.loader.word2vec_text.
load_with_vocab
(fin, vocab, dtype=<type 'numpy.float32'>)¶ Refer to
word_embedding_loader.loader.glove.load_with_vocab()
for the API.