dossier.fc
is a package that provides an implementation of feature
collections. Abstractly, a feature collection is a map from feature name to
a feature. While any type of feature can be supported, the core focus of this
package is on multisets or "bags of words" (BOW). In this package, the
default multiset implementation is a StringCounter
, which maps Unicode
strings to counts.
This package includes both Python and Java implementations of the package.
For other languages, we also document the binary format of the CBOR feature collections.