Python normalize_url 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: urlnorm

메소드/함수: normalize_url

hotexamples.com에서의 예제들: 4

Python normalize_url - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 urlnorm.normalize_url에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: net.py 프로젝트: bingsec/libjade

def expand_urls(page, urls):
	'''
	expand all urls in the page
	'''
	parent=base_url(page.strip())
	# print 'parent=', parent
	urls=[urljoin(parent, url.strip()) for url in urls]
	rets=[]
	for url in urls:
		try:
			nurl=normalize_url(url)
		except Exception, e:
			print 'error when normalize_url %s : %s', (url, e)
			continue
		rets.append(nurl)

예제 #2

파일 보기

    def get_urls(self, url, document):
        """
        Gets all the URLs in a document and returns them as absolute URLs.

        url -- The url of the document
        document -- The content of the document
        """
        urls = []
        soup = BeautifulSoup(document)

        for link in soup.find_all('a'):
            href = link.get('href')
            if href is not None:
                try:
                    #Convert relative urls to absolute urls
                    if not href.startswith('http'):
                        href = urljoin(url, href)
                    href = normalize_url(href)

                    urls.append(href)
                except:
                    pass

        return urls

예제 #3

파일 보기

파일: documentProcessor.py 프로젝트: aldro61/distributed_crawler

    def get_urls(self, url, document):
        """
        Gets all the URLs in a document and returns them as absolute URLs.

        url -- The url of the document
        document -- The content of the document
        """
        urls = []
        soup = BeautifulSoup(document)

        for link in soup.find_all('a'):
            href = link.get('href')
            if href is not None:
                try:
                    #Convert relative urls to absolute urls
                    if not href.startswith('http'):
                        href = urljoin(url, href)
                    href = normalize_url(href)

                    urls.append(href)
                except:
                    pass

        return urls

예제 #4

파일 보기

파일: net.py 프로젝트: bingsec/libjade

	>>> url_fix(u'http://de.wikipedia.org/wiki/Elf (Begriffsklärung)')
	'http://de.wikipedia.org/wiki/Elf%20%28Begriffskl%C3%A4rung%29'

	:param charset: The target charset for the URL if the url was
		    given as unicode string.
	'''
	if isinstance(url, unicode): url =url.encode(charset, 'ignore')
	scheme, netloc, path, qs, anchor = urlparse.urlsplit(url)
	path = urllib.quote(path, '/%')
	qs = urllib.quote_plus(qs, ':&=')
	return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))
    
if __name__=='__main__':
	url='http://a.a//../../asd/kk/../../../asd.asd/./ss/./././hsadk...$?1=1#kasjdl-qw'
	print url_merge_dots(url)
	print normalize_url(url)
	
	page='http://jadesoul-home'
	urls=u'''
		http://jadesoul-home/index.php
		http://jadesoul-home/?p=30
		a.html
		a/b/c/d.txt
		a/b/../c/d.txt
		http://a.a//../../asd/kk/../../../asd.asd/./ss/./././hsadk...$?1=1#kasjdl-qw
		https://www.abc.com./a.txt
		http://www.abc.com:80/a.txt
		https://www.abc.com.:8080/a.txt
		ftp://www.abc.com:21/a.txt
		ftp://www.abc.com:21/a.txt
		ftp://www.abc.com:21/ a.txt