Class SequenceUtil
Sequence class
Implements
Inherited Members
Namespace: Keras.PreProcessing.sequence
Assembly: Keras.dll
Syntax
public class SequenceUtil : Base, IDisposable
Methods
| Improve this Doc View SourceMakeSamplingTable(Int32, Single)
Generates a word rank-based probabilistic sampling table. Used for generating the sampling_table argument for skipgrams.sampling_table[i] is the probability of sampling the word i-th most common word in a dataset(more common words should be sampled less frequently, for balance). The sampling probabilities are generated according to the sampling distribution used in word2vec:
Declaration
public static NDarray MakeSamplingTable(int size, float sampling_factor = 1E-05F)
Parameters
Type | Name | Description |
---|---|---|
System.Int32 | size | The size. |
System.Single | sampling_factor | The sampling factor. |
Returns
Type | Description |
---|---|
Numpy.NDarray | A 1D Numpy array of length size where the ith entry is the probability that a word of rank i should be sampled. |
PadSequences(NDarray, Nullable<Int32>, String, String, String, Single)
Pads sequences to the same length. This function transforms a list of num_samples sequences(lists of integers) into a 2D Numpy array of shape(num_samples, num_timesteps). num_timesteps is either the maxlen argument if provided, or the length of the longest sequence otherwise. Sequences that are shorter than num_timesteps are padded with value at the end. Sequences longer than num_timesteps are truncated so that they fit the desired length.The position where padding or truncation happens is determined by the arguments padding and truncating, respectively. Pre-padding is the default.
Declaration
public static NDarray PadSequences(NDarray sequences, int? maxlen = default(int? ), string dtype = "int32", string padding = "pre", string truncating = "pre", float value = 0F)
Parameters
Type | Name | Description |
---|---|---|
Numpy.NDarray | sequences | The sequences. |
System.Nullable<System.Int32> | maxlen | The maxlen. |
System.String | dtype | The dtype. |
System.String | padding | The padding. |
System.String | truncating | The truncating. |
System.Single | value | The value. |
Returns
Type | Description |
---|---|
Numpy.NDarray | Numpy array with shape (len(sequences), maxlen) |
SkipGrams(NDarray, Int32, Int32, Single, Boolean, Boolean, NDarray, Nullable<Int32>)
Skips the grams.
Declaration
public static NDarray SkipGrams(NDarray sequence, int vocabulary_size, int window_size = 4, float negative_samples = 1F, bool shuffle = true, bool categorical = false, NDarray sampling_table = null, int? seed = default(int? ))
Parameters
Type | Name | Description |
---|---|---|
Numpy.NDarray | sequence | A word sequence (sentence), encoded as a list of word indices (integers). If using a sampling_table, word indices are expected to match the rank of the words in a reference dataset (e.g. 10 would encode the 10-th most frequently occurring token). Note that index 0 is expected to be a non-word and will be skipped. |
System.Int32 | vocabulary_size | Int, maximum possible word index + 1 |
System.Int32 | window_size | Int, size of sampling windows (technically half-window). The window of a word w_i will be [i - window_size, i + window_size+1]. |
System.Single | negative_samples | Float >= 0. 0 for no negative (i.e. random) samples. 1 for same number as positive samples. |
System.Boolean | shuffle | Whether to shuffle the word couples before returning them. |
System.Boolean | categorical | bool. if False, labels will be integers (eg. [0, 1, 1 .. ]), if True, labels will be categorical, e.g. [[1,0],[0,1],[0,1] .. ]. |
Numpy.NDarray | sampling_table | 1D array of size vocabulary_size where the entry i encodes the probability to sample a word of rank i. |
System.Nullable<System.Int32> | seed | Random seed. |
Returns
Type | Description |
---|---|
Numpy.NDarray | couples, labels: where couples are int pairs and labels are either 0 or 1. |