Python hatesearch示例

编程语言: Python

命名空间/包名称: M1_5_dictionary_approach_tweetlevel

方法/功能: hatesearch

hotexamples.com的示例: 4

Python hatesearch - 已找到4个示例。这些是从开源项目中提取的最受好评的M1_5_dictionary_approach_tweetlevel.hatesearch现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

def createMatrix(tweetText, tokenizer, maxVectorLength, pretrainedModel,
                 hatebase_dic):
    """Creates a matrix of word embeddings based on the embeddings of BERT and the dictionary approach.

    Args: 
        tweetText (str): The tweet as string.
        tokenizer (object): The instantiated tokenizer of the specific pretrained BERT model.
        maxVectorLength (int): The specified maximum vector length of the embeddings. 
        pretrainedModel (str): To specify which pretrained Bert model to use.
        hatebase_dic (dataframe): The hatebase dictionary as pandas dataframe.
    
    Returns: 
        matrix (torch tensor): The twodimensional matrix with BERT embeddings on the first and our dictionary approach on the second dimension. 
    """
    if (isinstance(tweetText, float)):  #empty value is interpreted as nan
        print("Float tweet found in data: \"" + str(tweetText) +
              "\" --> interpreting it as string with str(tweet)")

    tweetText = str(tweetText)  #empty tweets were interpreted as float

    raw_encoding = torch.tensor(
        tokenizer.encode(tweetText, max_length=maxVectorLength))

    vlength = raw_encoding.size()[0]

    encoding = padWithZeros(raw_encoding, maxVectorLength)

    hateMetric = padWithZeros(
        stretch(hatesearch(tweetText, hatebase_dic), vlength), maxVectorLength)

    matrix = torch.cat((encoding, hateMetric), 0).unsqueeze(0)

    # embeddings[i] = matrix
    return matrix

示例#2

显示文件

文件： M4_1_Testing_functions.py 项目： MaximilianKupi/nlp-project

 def test_HateSearch(self):
     input_tweet = "how could i be a f*g but i like bitches please tell me"
     function_output = hatesearch(data=input_tweet)
     ideal_output = torch.tensor([
         0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 87.5849, 0.0000, 0.0000,
         85.0000, 25.0000, 0.0000, 0.0000, 0.0000
     ])
     self.assertEqual(function_output, ideal_output)

示例#3

显示文件

 def test_HateSearch(self):
     """Tests the hatesearch function from M1_5_dictionary_approach_tweetlevel creates correct tensors based on one example.
     """
     input_tweet = "how could i be a f*g but i like bitches please tell me"
     function_output = hatesearch(data = input_tweet)
     ideal_output = torch.tensor([ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000, 87.58489525909593,  0.0000,  0.0000,
     85.0000, 25.0000,  0.0000,  0.0000,  0.0000])
     self.assertTrue(torch.equal(function_output, ideal_output))

示例#4

显示文件

文件： M1_4_vectorisation_2d_parallel.py 项目： MaximilianKupi/nlp-project

def createMatrix(tweetText, tokenizer, maxVectorLength, pretrainedModel,
                 hatebase_dic):

    if (isinstance(tweetText, float)):  #empty value is interpreted as nan
        print("Float tweet found in data: \"" + str(tweetText) +
              "\" --> interpreting it as string with str(tweet)")

    tweetText = str(tweetText)  #empty tweets were interpreted as float

    encoding = padWithZeros(
        torch.Tensor(tokenizer.encode(tweetText, max_length=maxVectorLength)),
        maxVectorLength)

    vlength = encoding.size()[0]
    hateMetric = padWithZeros(
        stretch(hatesearch(tweetText, hatebase_dic), vlength), maxVectorLength)

    matrix = torch.cat((encoding, hateMetric), 0).unsqueeze(0)

    # embeddings[i] = matrix
    return matrix