Python stringToWordDictionary示例

编程语言: Python

命名空间/包名称: repool_util

方法/功能: stringToWordDictionary

hotexamples.com的示例: 3

Python stringToWordDictionary - 已找到3个示例。这些是从开源项目中提取的最受好评的repool_util.stringToWordDictionary现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

def demo3():
    """
    You found a cool paper online and you want to find similar papers:
    1. Download and parse the pdf
    2. Compare to text of all publications in pubs_ database
    3. Open the top 3 matches in browser (but note that current matching alg is
                                          very basic and could be much improved)
    
    Pre-requisites:
    - Assumes 'pubs_nips' exists and contains pdf text inside 
      (under key 'pdf_text'). This can be obtained by running 
      nips_download_parse.py and then nips_add_pdftext.py 
      or by downloading it from site.
      (https://sites.google.com/site/researchpooler/home)
    
    Side-effects:
    - will use os call to open a pdf with default program
    """

    # fetch this pdf from website, parse it, and make a publication dict from it
    # here is a random pdf from Andrew's website
    url = 'http://ai.stanford.edu/~ang/papers/icml11-DeepEnergyModels.pdf'
    print "downloading %s..." % (url, )
    text = convertPDF(url)  #extract the text
    bow = stringToWordDictionary(
        text)  #extract the bag of words representation
    p = {'pdf_text': bow}  #create a dummy publication dict

    # calculate similarities to our publications
    print "loading database..."
    pubs = loadPubs('pubs_nips')
    print "computing similarities. (may take while with current implementation)"
    scores = publicationSimilarityNaive(pubs, p)

    # find highest scoring pubs
    lst = [(s, i) for i, s in enumerate(scores) if s >= 0]
    lst.sort(reverse=True)

    # display top 50 matches
    m = min(50, len(lst))
    for s, i in lst[:m]:
        print "%.2f is similarity to %s." % (s, pubs[i]['title'])

    #open the top 3 in browser
    print "opening the top 3..."
    openPDFs([pubs[i]['pdf'] for s, i in lst[:3]])

示例#2

显示文件

文件： demo3.py 项目： Shivamagrawal2014/researchpooler

def demo3():
    """
    You found a cool paper online and you want to find similar papers:
    1. Download and parse the pdf
    2. Compare to text of all publications in pubs_ database
    3. Open the top 3 matches in browser (but note that current matching alg is
                                          very basic and could be much improved)
    
    Pre-requisites:
    - Assumes 'pubs_nips' exists and contains pdf text inside 
      (under key 'pdf_text'). This can be obtained by running 
      nips_download_parse.py and then nips_add_pdftext.py 
      or by downloading it from site.
      (https://sites.google.com/site/researchpooler/home)
    
    Side-effects:
    - will use os call to open a pdf with default program
    """
    
    # fetch this pdf from website, parse it, and make a publication dict from it
    # here is a random pdf from Andrew's website
    url = 'http://ai.stanford.edu/~ang/papers/icml11-DeepEnergyModels.pdf'
    print "downloading %s..." % (url,)
    text = convertPDF(url) #extract the text
    bow = stringToWordDictionary(text) #extract the bag of words representation
    p = {'pdf_text' : bow} #create a dummy publication dict
    
    # calculate similarities to our publications
    print "loading database..."
    pubs = loadPubs('pubs_nips')
    print "computing similarities. (may take while with current implementation)"
    scores = publicationSimilarityNaive(pubs, p)
    
    # find highest scoring pubs
    lst = [(s, i) for i,s in enumerate(scores) if s>=0]
    lst.sort(reverse = True)
    
    # display top 50 matches
    m = min(50, len(lst))
    for s, i in lst[:m]:
        print "%.2f is similarity to %s." % (s, pubs[i]['title'])
    
    #open the top 3 in browser
    print "opening the top 3..."
    openPDFs([pubs[i]['pdf'] for s,i in lst[:3]])

示例#3

显示文件

文件： nips_add_pdftext.py 项目： simcard0000/research-pooler

        try:
            floc = p['pdf'].index('NIPS')
            fname = p['pdf'][floc:]
            txt = convertPDF('downloads/'+fname)
            processed = True
            print 'found %s in file!' % (p['title'],)
        except:
            pass
            
        if not processed:
            # download the PDF and convert to text
            try:
                print 'downloading pdf for [%s] and parsing...' % (p.get('title', 'an un-titled paper'))
                txt = convertPDF(p['pdf'])
                processed = True
                print 'processed from url!'
            except:
                print 'error: unable to open download the pdf from %s' % (p['pdf'],)
                print 'skipping...'
        
        if processed:
            # convert to bag of words and store
            try:
                p['pdf_text'] = stringToWordDictionary(txt)
            except:
                print 'was unable to convert text to bag of words. Skipped.'
                
        
    print '%d/%d = %.2f%% done.' % (i+1, len(pubs), 100*(i+1.0)/len(pubs))
    
savePubs('pubs_nips', pubs_all)