def sample(self, n): """Sample a batch size n of experience""" if len(self.memory) < n: raise IndexError('Size of memory ({}) is less than requested sample ({})'.format(len(self), n)) else: scores = [x[1] for x in self.memory] sample = np.random.choice(len(self), size=n, replace=False, p=scores / np.sum(scores)) sample = [self.memory[i] for i in sample] smiles = [x[0] for x in sample] scores = [x[1] for x in sample] prior_likelihood = [x[2] for x in sample] tokenized = [self.voc.tokenize(smile) for smile in smiles] encoded = [self.voc.encode(tokenized_i) for tokenized_i in tokenized] encoded = Dataset.collate_fn(encoded) return encoded, np.array(scores), np.array(prior_likelihood)
def initiate_from_file(self, fname, scoring_function, Prior): """Adds experience from a file with SMILES Needs a scoring function and an RNN to score the sequences""" with open(fname, 'r') as f: smiles = [] for line in f: smile = line.split()[0] if Chem.MolFromSmiles(smile): smile = Chem.MolToSmiles(Chem.MolFromSmiles(smile), isomericSmiles=False) smiles.append(smile) scores = scoring_function(smiles) tokenized = [self.voc.tokenize(smile) for smile in smiles] encoded = [self.voc.encode(tokenized_i) for tokenized_i in tokenized] encoded = Dataset.collate_fn(encoded) prior_likelihood, _ = Prior.likelihood(encoded.long()) prior_likelihood = prior_likelihood.data.cpu().numpy() new_experience = zip(smiles, scores, prior_likelihood) self.add_experience(new_experience)