Ejemplos de NetworkDataGenerator.reset en Python

Lenguaje de programación: Python

Namespace/Package Name: htmresearch.support.network_text_data_generator

Método / Función: reset

Ejemplos en hotexamples.com: 1

Python NetworkDataGenerator.reset - 1 ejemplos encontrados. Estos son los ejemplos en Python del mundo real mejor valorados de htmresearch.support.network_text_data_generator.NetworkDataGenerator.reset extraídos de proyectos de código abierto. Puedes valorar ejemplos para ayudarnos a mejorar la calidad de los ejemplos.

Métodos usados con frecuencia

Mostrar Ocultar

NetworkDataGenerator(12)

split(8)

saveData(6)

getNumberOfTokens(3)

randomizeData(3)

generateSequence(1)

getClassifications(1)

getSamples(1)

reset(1)

setupData(1)

stripCategories(1)

Ejemplo n.º 1

Mostrar archivo

  def setupNetData(
    self, generateData=True, seed=42, preprocess=False, **kwargs):
    """
    Resulting network data files created:
      - One for each bucket
      - One for each training rep, where samples are not repeated in a given
      file. Each samples is given its own category (_category = _sequenceId).

    The classification json is saved when generating the final training file.
    """
    if generateData:
      ndg = NetworkDataGenerator()
      self.dataDict = ndg.split(
        filePath=self.dataPath, numLabels=1, textPreprocess=preprocess,
        **kwargs)

      filename, ext = os.path.splitext(self.dataPath)
      self.classificationFile = "{}_categories.json".format(filename)

      # Generate test data files: one network data file for each bucket.
      bucketFilePaths = bucketCSVs(self.dataPath)
      for bucketFile in bucketFilePaths:
        ndg.reset()
        ndg.split(
          filePath=bucketFile, numLabels=1, textPreprocess=preprocess, **kwargs)
        bucketFileName, ext = os.path.splitext(bucketFile)
        if not self.orderedSplit:
          # the sequences will be written to the file in random order
          ndg.randomizeData(seed)
        dataFile = "{}_network{}".format(bucketFileName, ext)
        ndg.saveData(dataFile, self.classificationFile)  # the classification file here gets (correctly) overwritten later
        self.bucketFiles.append(dataFile)

      # Generate training data file(s).
      self.trainingDicts = []
      uniqueDataDict = OrderedDict()
      included = []
      seqID = 0
      for dataEntry in self.dataDict.values():
        uniqueID = dataEntry[2]
        if uniqueID not in included:
          # skip over the samples that are repeated in multiple buckets
          uniqueDataDict[seqID] = dataEntry
          included.append(uniqueID)
          seqID += 1
      self.trainingDicts.append(uniqueDataDict)

      ndg.reset()
      ndg.split(
        dataDict=uniqueDataDict, numLabels=1, textPreprocess=preprocess,
        **kwargs)
      for rep in xrange(self.trainingReps):
        # use a different file for each training rep
        if not self.orderedSplit:
          ndg.randomizeData(seed)
        ndg.stripCategories()  # replace the categories w/ seqId
        dataFile = "{}_network_training_{}{}".format(filename, rep, ext)
        ndg.saveData(dataFile, self.classificationFile)
        self.dataFiles.append(dataFile)

      # TODO: maybe add a method (and arg) for removing all these data files

    else:
      # TODO (only if needed)
      raise NotImplementedError("Must generate data.")

    # labels references match the classification json
    self.mapLabelRefs()