Class IMDB
Dataset of 25,000 movies reviews from IMDB, labeled by sentiment (positive/negative). Reviews have been preprocessed, and each review is encoded as a sequence of word indexes (integers). For convenience, words are indexed by overall frequency in the dataset, so that for instance the integer "3" encodes the 3rd most frequent word in the data. This allows for quick filtering operations such as: "only consider the top 10,000 most common words, but eliminate the top 20 most common words". As a convention, "0" does not stand for a specific word, but instead is used to encode any unknown word.
Implements
Inherited Members
Namespace: Keras.Datasets
Assembly: Keras.dll
Syntax
public class IMDB : Base, IDisposable
Methods
| Improve this Doc View SourceGetWordIndex(String)
Gets the index of the word.
Declaration
public static Dictionary<string, int> GetWordIndex(string path = "imdb_word_index.json")
Parameters
Type | Name | Description |
---|---|---|
System.String | path | The path. |
Returns
Type | Description |
---|---|
System.Collections.Generic.Dictionary<System.String, System.Int32> |
LoadData(String, Nullable<Int32>, Int32, Nullable<Int32>, Int32, Int32, Int32, Int32)
Loads the data.
Declaration
public static ((NDarray, NDarray), (NDarray, NDarray))LoadData(string path = "imdb.npz", int? num_words = default(int? ), int skip_top = 0, int? maxlen = default(int? ), int seed = 113, int start_char = 1, int oov_char = 2, int index_from = 3)
Parameters
Type | Name | Description |
---|---|---|
System.String | path | if you do not have the data locally (at '~/.keras/datasets/' + path), it will be downloaded to this location. |
System.Nullable<System.Int32> | num_words | integer or None. Top most frequent words to consider. Any less frequent word will appear as oov_char value in the sequence data. |
System.Int32 | skip_top | integer. Top most frequent words to ignore (they will appear as oov_char value in the sequence data). |
System.Nullable<System.Int32> | maxlen | int. Maximum sequence length. Any longer sequence will be truncated. |
System.Int32 | seed | int. Seed for reproducible data shuffling. |
System.Int32 | start_char | int. The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character. |
System.Int32 | oov_char | int. words that were cut out because of the num_words or skip_top limit will be replaced with this character. |
System.Int32 | index_from | int. Index actual words with this index and higher. |
Returns
Type | Description |
---|---|
System.ValueTuple<System.ValueTuple<Numpy.NDarray, Numpy.NDarray>, System.ValueTuple<Numpy.NDarray, Numpy.NDarray>> |