Data Layers

class HDF5DataLayer

Load data from a list of HDF5 files and feed them to upper layers in mini batches. The layer will do automatic round wrapping and report epochs after going over a full round of list data sources. Currently randomization is not supported.

Each dataset in the HDF5 file should be a N-dimensional tensor. The last tensor dimension (the slowest changing one) is treated as the number dimension, and split for mini-batch. For more details for ND-tensor blobs used in Mocha, see Blob.

The numerical types of the HDF5 datasets should either be Float32 or Float64. Even for multi-class labels, the integer class indicators should still be stored as floating point.


For N class multi-class labels, the labels should be numerical values from 0 to N-1, even though Julia use 1-based indexing (See SoftmaxLossLayer).

The HDF5 dataset format is compatible with Caffe. If you want to compare the results of Mocha to Caffe on the same data, you could use Caffe’s HDF5 Data Layer to read from the same HDF5 files Mocha is using.


File name of the data source. The source should be a text file, in which each line specifies a file name to a HDF5 file to load.


The number of data samples in each mini batch.


Default [:data, :label]. List of symbols, specifying the name of the blobs to feed to the top layers. The names also correspond to the datasets to load from the HDF5 files specified in the data source.


Default []. List of data transformers. Each entry in the list should be a tuple of (name, transformer), where name is a symbol of the corresponding output blob name, and transformer is a data transformer that should be applied to the blob with the given name. Multiple transformers could be given to the same blob, and they will be applied in the order provided here.


Default false. When enabled, the data is randomly shuffled. Data shuffling is useful in training, but for testing, there is no need to do shuffling. Shuffled access is a little bit slower, and it requires the HDF5 dataset to be mmappable. For example, the dataset can neither be chunked nor be compressed. Please refer to the document of HDF5.jl for more details.

class MemoryDataLayer

Wrap an in-memory Julia Array as data source. Useful for testing.


List of symbols, specifying the name of the blobs to produce.


The number of data samples in each mini batch.


List of Julia Arrays. The count should be equal to the number of tops, where each Array acts as the data source for each blob.


Default []. See transformers of HDF5DataLayer.