def getFormData(comPattern, content, posini): if not isinstance(comPattern, CustomRegEx.ExtRegexObject): raise TypeError('Expecting an CustomRegEx.ExtRegexObject') match = comPattern.search(content, posini) if not match: return None posfin = match.end() formHtml = match.group() formAttr = getAttrDict(formHtml, 0, noTag=False)[1] formAttr.pop('*ParamPos*') formFields = collections.OrderedDict() if formAttr and formAttr.has_key('id'): formId = formAttr['id'] pattern = r'''\$\(['"]<input/>['"]\)\.attr\(\{(?P<input_attr>[^}]+)\}\)\.prependTo\(['"]#%s['"]\)''' % formId prependVars = re.findall(pattern, content) for avar in prependVars: avar = avar.replace(': ', ':').replace(',', '').replace(':', '=') avar = '<input ' + avar + ' prepend="">' attr = getAttrDict(avar, 0, noTag=True) name = attr['name'] formFields[name] = attr pattern = r'(?#<form<__TAG__="input|select|textarea"=tag name=name>*>)' for m in CustomRegEx.finditer(pattern, formHtml): # tag, name = map(operator.methodcaller('lower'),m.groups()) tag, name = m.groups() p1, p2 = m.span() attr = getAttrDict(m.group(), 0, noTag=True) attr.pop('*ParamPos*') if formFields.get(name, None): if 'value' in attr and formFields[name].has_key('value'): value = formFields[name]['value'] if isinstance(value, basestring): value = [value] value.append(attr['value']) formFields[name]['value'] = value else: formFields[name] = attr if attr.has_key('list'): pattern = r'(?#<datalist id="%s"<value=value>*>)' % attr['list'] attr['value'] = CustomRegEx.findall(pattern, formHtml) pass elif tag == 'select': pattern = r'(?#<option value=value *=&lvalue&>)' match = CustomRegEx.findall(pattern, formHtml[p1:p2]) # attr['value'] = map(operator.itemgetter(0), match) # attr['lvalue'] = map(operator.itemgetter(1), match) attr['value'], attr['lvalue'] = match.groups() pattern = r'(?#<option value=value>)' attr['value'] = CustomRegEx.findall(pattern, formHtml[p1:p2]) pattern = r'(?#<option value=value selected>)' try: attr['default'] = CustomRegEx.findall(pattern, formHtml[p1:p2])[0] except: attr['default'] = '' pass elif tag == 'textarea': attr['value'] = attr.get('*', '') continue pass return posfin, formAttr, formFields
def test_general(self): answer = CustomRegEx.findall('(?#<hijo id="hijo1" *=label>)', self.htmlStr) required = ['primer hijo'] assert answer == required, 'Comentario y variable independiente' answer = CustomRegEx.findall('(?#<hijo id=varid *=label>)', self.htmlStr) required = [('hijo1', 'primer hijo'), ('hijo2', ''), ('hijo3', 'tercer hijo')] assert answer == required, 'Utilizando variables para distinguir casos' answer = CustomRegEx.findall('(?#<hijo id="hijo[13]"=varid *=label>)', self.htmlStr) required = [('hijo1', 'primer hijo'), ('hijo3', 'tercer hijo')] assert answer == required, 'Utilizando variables para distinguir casos' answer = CustomRegEx.findall('(?#<hijo exp *=label>)', self.htmlStr) required = [''] assert answer == required, 'Utilizando atributos requeridos (exp) para distinguir un caso' answer = CustomRegEx.findall('(?#<hijo exp .*>)', self.htmlStr) required = [('El primer comentario', 'El segundo comentario', 'El tercer comentario')] assert answer == required, 'Comentarios incluidos en tag' with pytest.raises(re.error): 'Error porque no se pueden utilizar variables cuando se tiene ".*" como variable requerida' CustomRegEx.compile('(?#<span class=var1 .*>)')
def thevideo(videoId, encHeaders=''): headers = { 'User-Agent': DESKTOP_BROWSER, 'Referer': 'http://thevideo.me/%s' % videoId } encodeHeaders = urllib.urlencode(headers) urlStr = 'http://thevideo.me/%s<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'''name: '(?P<var1>[^']+)', value: '(?P<var2>[^']+)' \}\).prependTo\(\"#veriform\"\)''' formVars = CustomRegEx.findall(pattern, content) pattern = r"(?#<form .input<name=var1 value=var2>*>)" formVars.extend(CustomRegEx.findall(pattern, content)) pattern = r"\$\.cookie\(\'(?P<var1>[^']+)\', \'(?P<var2>[^']+)\'" cookieval = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join( map(lambda x: '='.join(x), [(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) headers['Cookie'] = '; '.join(map(lambda x: '='.join(x), cookieval)) encodeHeaders = urllib.urlencode(headers) urlStr = 'http://thevideo.me/%s<post>%s<headers>%s' % (videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r"label: '(?P<res>[^']+)', file: '(?P<url>[^']+)'" sources = CustomRegEx.findall(pattern, content) res, href = sources.pop() return href pass
def getMenuHeaderFooterOLD(param, args, data, menus): htmlUnescape = HTMLParser.HTMLParser().unescape menuId = args.get('menu', ['rootmenu'])[0] url = args.get("url")[0] headerFooter = [] for k, elem in enumerate(menus): opLabel, opregexp = elem opdefault, sep, opvalues = opregexp.partition('|') opvalues = opvalues or opdefault opdefault = opdefault if sep else '' pIni, pFin = 0, -1 if opdefault.startswith('(?#<SPAN>)'): pIni, match = -1, CustomRegEx.search(opdefault, data) if match: pIni, pFin = match.span(0) opmenu = CustomRegEx.findall(opvalues, data[pIni:pFin]) if not opmenu: continue cmpregex = CustomRegEx.compile(opvalues) tags = cmpregex.groupindex.keys() menuUrl = [elem[tags.index('url')] for elem in opmenu] if len(tags) > 1 else opmenu if 'label' in tags: menuLabel = map(htmlUnescape, [elem[tags.index('label')] for elem in opmenu]) else: menuLabel = len(menuUrl) * ['Label placeholder'] if opdefault: match = CustomRegEx.search(opdefault, data) opdefault = htmlUnescape(match.group(1) if match else '') paramDict = dict([(key, value[0]) for key, value in args.items() if hasattr(value, "__getitem__") and key not in ["header", "footer"]]) paramDict.update({'section':param, 'url':url, param:k, 'menu':menuId, 'menulabel': str(menuLabel), 'menuurl':str(menuUrl)}) itemParam = {'isFolder':True, 'label':opLabel + opdefault} headerFooter.append([paramDict, itemParam, None]) return headerFooter
def vidto(videoId, headers = None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://vidto.me/%s.html<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r'(?#<Form method="POST".input<type="hidden" name=name value=value>*>)' formVars = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join(map(lambda x: '='.join(x),[(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) urlStr = 'http://vidto.me/%s.html<post>%s<headers>%s' % (videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'(?#<a class="player-url" href=url>)' sources = CustomRegEx.findall(pattern, content, re.DOTALL) href = sources.pop() urlStr = '%s|%s' % (href,urllib.urlencode({'User-Agent':MOBILE_BROWSER})) return urlStr pass
def getParseDirectives(regexp): rawDir = CustomRegEx.findall(r'\?#<([^>]+)>', regexp) fltrDir = {} for rawkey in rawDir: key = rawkey.upper().strip('0123456789') if key in ['SPAN', 'NXTPOSINI']: value = int(rawkey[len(key):]) if len(rawkey) != len(key) else 0 fltrDir[key] = value return fltrDir
def test_nzone(self): allspan = [ ('independiente', 'span0'), ('bloque1', 'span1'), ('bloque1', 'span2'), #En script ('independiente', 'bloque1'), ('independiente', 'bloque2'), #En bloque ('bloque2', 'span1'), ('bloque2', 'span2'), #En <!-- ('independiente', 'span3') ] answer1 = CustomRegEx.findall('(?#<span class=test *=label>)', self.htmlStr) required = [lista for lista in allspan if lista[0] == 'independiente'] assert answer1 == required, 'Por default se excluyen Los tags buscados en self.htmlStr contenidos en zonas <!--xxx--> y script' answer2 = CustomRegEx.findall( '(?#<span class=test *=label __EZONE__="[!--|script]">)', self.htmlStr) assert answer1 == answer2, 'El resultado por default se obtiene haciendo __NZONE__="[!--|script]" ' answer = CustomRegEx.findall( '(?#<span class=test *=label __EZONE__="">)', self.htmlStr) assert answer == allspan, 'Para no tener zonas de exclusi.n se hace __EZONE__=""' answer = CustomRegEx.findall( '(?#<span class=test *=label __EZONE__="[bloque]">)', self.htmlStr) required = [ lista for lista in allspan if not lista[1].startswith('bloque') ] assert answer == required, 'Se personaliza la zona de exclusi.n asignando a __NZONE__="xxx|zzz" donde xxx y zzz son tags' answer = CustomRegEx.findall( '(?#<span class=test *=label __EZONE__="^[!--|script]">)', self.htmlStr) required = [ lista for lista in allspan if lista[0].startswith('bloque') ] assert answer == required, 'Para incluir solo tags buscados en las zonas xxx y zzz se debe hacer __NZONE__="^[xxx|zzz]' answer = CustomRegEx.findall('(?#<a href=url *=labe>)', self.htmlStr) required = [] assert answer == required answer = CustomRegEx.findall( '(?#<a href=url *=label __EZONE__="^[script]">)', self.htmlStr) required = [('http://www.eltiempo.com.co', 'El Tiempo')] assert answer == required answer = CustomRegEx.findall( '(?#<a href=url *=label __EZONE__="^[!--]">)', self.htmlStr) required = [('http://www.elheraldo.com.co', 'El Heraldo')] assert answer == required
def _getSectionDelimiters(self, section): sections = self._sections if self._sections is None: content = self.getUrlContent() pattern = r'(?#<h[12] class="api.+?">)' sections = crgx.findall(pattern, content) sections = filter(lambda x: 'Protected' not in x, sections) sections.append(u'<!-- end jd-content -->') self._sections = sections it = itertools.dropwhile(lambda x: section not in x, sections) return (it.next(), it.next())
def vidto(videoId, headers=None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://vidto.me/%s.html<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r'(?#<Form method="POST".input<type="hidden" name=name value=value>*>)' formVars = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join( map(lambda x: '='.join(x), [(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) urlStr = 'http://vidto.me/%s.html<post>%s<headers>%s' % (videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'(?#<a class="player-url" href=url>)' sources = CustomRegEx.findall(pattern, content, re.DOTALL) href = sources.pop() urlStr = '%s|%s' % (href, urllib.urlencode({'User-Agent': MOBILE_BROWSER})) return urlStr pass
def test_tag(self): answer = CustomRegEx.findall('(?#<span|a *=label>)', self.htmlStr) required1 = ['span0', 'bloque1', 'bloque2', 'span3'] assert answer == required1, 'Obtener texto de tags span o a' cmpobj = CustomRegEx.compile('(?#<(span|a) *=label>)') answer = cmpobj.groupindex.keys() required2 = ['__TAG__', 'label'] assert answer == required2, 'Al encerrar el tagpattern entre paréntesis el nametag se almacena en la variable __TAG__ ' answer = cmpobj.findall(self.htmlStr) required3 = [('span', 'span0'), ('span', 'bloque1'), ('span', 'bloque2'), ('span', 'span3')] assert answer == required3, 'El primer componente de los tuples que conforman answer corresponde al nametag' cmpobj = CustomRegEx.compile( '(?#<span|a __TAG__=mi_nametag_var *=label>)') answer = cmpobj.groupindex.keys() required4 = ['mi_nametag_var', 'label'] assert answer == required4, 'Al utilizar el atributo __TAG__ se puede asignar una variable que contendra el nametag de los tags que cumplen con el pattern buscado' answer = cmpobj.findall(self.htmlStr) assert answer == required3, 'El resultado es el mismo, cambia solo el nombre de la variable asociada al nametag' cmpobj = CustomRegEx.compile('(?#<__TAG__ *="[sb].+?"=label>)') answer = cmpobj.findall(self.htmlStr) assert answer == required1, 'Al utilizar __TAG__ como tag attribute se hace el tagpattern = "[a-zA-Z][^\s>]*", para con el primer resultado se asigna "[sb].+?" al *' cmpobj = CustomRegEx.compile('(?#<(__TAG__) *=".+?"=label>)') answer = cmpobj.groupindex.keys() assert answer == required2, 'Se puede utiliza (__TAG__) para guardar el nametag en la variable __TAG__' cmpobj = CustomRegEx.compile( '(?#<__TAG__ __TAG__=mi_nametag_var *=".+?"=label>)') answer = cmpobj.groupindex.keys() assert answer == required4, 'Se puede utiliza __TAG__=nombrevar para guardar el nametag en una variable con nmbre propio' cmpobj = CustomRegEx.compile( '(?#<__TAG__ __TAG__=mi_nametag_var *=label>)') answer = cmpobj.findall(self.htmlStr) required = [('span', 'span0'), ('script', ''), ('bloque', ''), ('span', 'span3')] assert answer == required, 'Utilizando __TAG__ como tagpattern' cmpobj = CustomRegEx.compile( '(?#<__TAG__ __TAG__="span|a"=mi_nametag_var *=label>)') answer = cmpobj.findall(self.htmlStr) assert answer == required3, 'Utilizando __TAG__="span|a"=mi_nametag_var se redefine el tagpattern a "span|a" y se asigna a la variable mi_nametag_var' with pytest.raises(re.error): 'Entrega error porque se utiliza (__TAG__) como tagpattern y con __TAG__=mi_nametag_var se intenta asignarle a otra variable' CustomRegEx.compile( '(?#<(__TAG__) __TAG__=mi_nametag_var *=label>)')
def thevideo(videoId, encHeaders = ''): headers = {'User-Agent':DESKTOP_BROWSER, 'Referer': 'http://thevideo.me/%s' % videoId} encodeHeaders = urllib.urlencode(headers) urlStr = 'http://thevideo.me/%s<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'''name: '(?P<var1>[^']+)', value: '(?P<var2>[^']+)' \}\).prependTo\(\"#veriform\"\)''' formVars = CustomRegEx.findall(pattern, content) pattern = r"(?#<form .input<name=var1 value=var2>*>)" formVars.extend(CustomRegEx.findall(pattern, content)) pattern = r"\$\.cookie\(\'(?P<var1>[^']+)\', \'(?P<var2>[^']+)\'" cookieval = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join(map(lambda x: '='.join(x),[(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) headers['Cookie'] = '; '.join(map(lambda x: '='.join(x),cookieval)) encodeHeaders = urllib.urlencode(headers) urlStr = 'http://thevideo.me/%s<post>%s<headers>%s' % (videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r"label: '(?P<res>[^']+)', file: '(?P<url>[^']+)'" sources = CustomRegEx.findall(pattern, content) res, href = sources.pop() return href pass
def prepareEqLocals(self, startEq): pattern = r'[+*]*(?:start|stop)\("(.+?)"\)[*+]*' procIds = CustomRegEx.findall(pattern, startEq.replace(' ', '')) locals = dict() for id in procIds: key2, key1 = '_end%s_' % id, '_beg%s_' % id locals[key1] = id in self.activeList locals[key2] = False functions = dict( lt=lambda x, n: self.actProcess(x) < n, gt=lambda x, n: self.actProcess(x) > n, isact=lambda x: x in self.activeList, start=lambda x: locals['_beg%s_' % x], stop=lambda x: locals['_beg%s_' % x] and locals['_end%s_' % x]) locals.update(functions) return locals
def allmyvideos(videoId, headers = None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://allmyvideos.net/%s<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r'(?#<form .input<name=name value=value>*>)' formVars = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join(map(lambda x: '='.join(x),[(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) urlStr = 'http://allmyvideos.net/%s<post>%s<headers>%s' % (videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'"file" : "(?P<url>[^"]+)".+?"label" : "(?P<label>[^"]+)"' sources = re.findall(pattern, content, re.DOTALL) href, res = sources.pop() urlStr = '%s|%s' % (href,urllib.urlencode({'User-Agent':MOBILE_BROWSER})) return urlStr pass
def vidzi(videoId, headers = None): strVal = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://vidzi.tv/%s.html<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r"(?#<script *='eval.+?'=pack>)" packed = CustomRegEx.search(pattern, content).group('pack') pattern = "}\((?P<tupla>\'.+?\))(?:,0,{})*\)" m = re.search(pattern, packed) mgrp = m.group(1).rsplit(',', 3) patron, base, nTags, lista = mgrp[0], int(mgrp[1]), int(mgrp[2]), eval(mgrp[3]) while nTags: nTags -= 1 tag = strVal[nTags] if nTags < base else strVal[nTags/base] + strVal[nTags%base] patron = re.sub('\\b' + tag + '\\b', lista[nTags] or tag, patron) pattern = 'file:"([^"]+(?:mp4|ed=))"' sources = CustomRegEx.findall(pattern,patron) return sources.pop()
def allmyvideos(videoId, headers=None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://allmyvideos.net/%s<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r'(?#<form .input<name=name value=value>*>)' formVars = CustomRegEx.findall(pattern, content) qte = urllib.quote postdata = '&'.join( map(lambda x: '='.join(x), [(var1, qte(var2) if var2 else '') for var1, var2 in formVars])) urlStr = 'http://allmyvideos.net/%s<post>%s<headers>%s' % ( videoId, postdata, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] pattern = r'"file" : "(?P<url>[^"]+)".+?"label" : "(?P<label>[^"]+)"' sources = re.findall(pattern, content, re.DOTALL) href, res = sources.pop() urlStr = '%s|%s' % (href, urllib.urlencode({'User-Agent': MOBILE_BROWSER})) return urlStr pass
def vidzi(videoId, headers=None): strVal = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) url = 'http://vidzi.tv/%s.html<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(url)[1] pattern = r"(?#<script *='eval.+?'=pack>)" packed = CustomRegEx.search(pattern, content).group('pack') pattern = "}\((?P<tupla>\'.+?\))(?:,0,{})*\)" m = re.search(pattern, packed) mgrp = m.group(1).rsplit(',', 3) patron, base, nTags, lista = mgrp[0], int(mgrp[1]), int(mgrp[2]), eval( mgrp[3]) while nTags: nTags -= 1 tag = strVal[nTags] if nTags < base else strVal[nTags / base] + strVal[nTags % base] patron = re.sub('\\b' + tag + '\\b', lista[nTags] or tag, patron) pattern = 'file:"([^"]+(?:mp4|ed=))"' sources = CustomRegEx.findall(pattern, patron) return sources.pop()
def getFormData(comPattern, content, posini): if not isinstance(comPattern, CustomRegEx.ExtRegexObject): raise TypeError('Expecting an CustomRegEx.ExtRegexObject') match = comPattern.search(content, posini) if not match: return None posfin = match.end() formHtml = match.group() formAttr = getAttrDict(formHtml, 0, noTag=False)[1] formAttr.pop('*ParamPos*') formFields = collections.OrderedDict() if formAttr and formAttr.has_key('id'): formId = formAttr['id'] pattern = r'''\$\(['"]<input/>['"]\)\.attr\(\{(?P<input_attr>[^}]+)\}\)\.prependTo\(['"]#%s['"]\)''' % formId prependVars = re.findall(pattern, content) for avar in prependVars: avar = avar.replace(': ', ':').replace(',', '').replace(':', '=') avar = '<input ' + avar + ' prepend="">' attr = getAttrDict(avar, 0, noTag=True) name = attr['name'] formFields[name] = attr pattern = r'(?#<form<__TAG__="input|select|textarea"=tag name=name>*>)' for m in CustomRegEx.finditer(pattern, formHtml): # tag, name = map(operator.methodcaller('lower'),m.groups()) tag, name = m.groups() p1, p2 = m.span() attr = getAttrDict(m.group(), 0, noTag=True) attr.pop('*ParamPos*') if formFields.get(name, None): if 'value' in attr and formFields[name].has_key('value'): value = formFields[name]['value'] if isinstance(value, basestring): value = [value] value.append(attr['value']) formFields[name]['value'] = value else: formFields[name] = attr if attr.has_key('list'): pattern = r'(?#<datalist id="%s"<value=value>*>)' % attr['list'] attr['value'] = CustomRegEx.findall(pattern, formHtml) pass elif tag == 'select': pattern = r'(?#<option value=value *=&lvalue&>)' match = CustomRegEx.findall(pattern, formHtml[p1:p2]) # attr['value'] = map(operator.itemgetter(0), match) # attr['lvalue'] = map(operator.itemgetter(1), match) attr['value'], attr['lvalue'] = match.groups() pattern = r'(?#<option value=value>)' attr['value'] = CustomRegEx.findall(pattern, formHtml[p1:p2]) pattern = r'(?#<option value=value selected>)' try: attr['default'] = CustomRegEx.findall( pattern, formHtml[p1:p2])[0] except: attr['default'] = '' pass elif tag == 'textarea': attr['value'] = attr.get('*', '') continue pass return posfin, formAttr, formFields
def openload(videoId, headers=None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) urlStr = 'https://openload.co/embed/%s/<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] varTags = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' pattern = r'(?#<video script.*=puzzle>)' puzzle = CustomRegEx.findall(pattern, content)[0] vars = sorted(set(re.findall(r'\(([^=)(]+)\) *=', puzzle))) keys1 = re.findall(r', *(?P<key>[^: ]+) *:', puzzle) keys2 = re.findall(r"\(゚Д゚\) *\[[^']+\] *=", puzzle) keys = sorted(set(keys1 + keys2)) totVars = vars + keys for k in range(len(vars)): puzzle = puzzle.replace(vars[k], varTags[k]) for k in range(len(keys)): puzzle = puzzle.replace(keys[k], varTags[-k - 1]) # puzzle = puzzle.replace('\xef\xbe\x89'.decode('utf-8'), '').replace(' ','') puzzle = re.sub(r'[ \x80-\xff]', '', puzzle) pat_dicId = r'\(([A-Z])\)={' m = re.search(pat_dicId, puzzle) assert m, 'No se encontro Id del diccionario' dicId = m.group(1) # pat_obj = r"\(\(%s\)\+\\'_\\'\)" % dicId dic_pat1 = r"\(\(%s\)\+\'_\'\)" % dicId dic_pat2 = r"\(%s\+([^+)]+)\)" % dicId dic_pat3 = r"\(%s\)\.(.+?)\b" % dicId dic_pat4 = r"(?<=[{,])([^: ]+)(?=:)" puzzle = re.sub(dic_pat1, "'[object object]_'", puzzle) puzzle = re.sub(dic_pat2, lambda x: "('[object object]'+str((%s)))" % x.group(1), puzzle) puzzle = re.sub(dic_pat3, lambda x: "(%s)['%s']" % (dicId, x.group(1)), puzzle) puzzle = re.sub(dic_pat4, lambda x: "'%s'" % x.group(1), puzzle) pat_str1 = r"\((\(.+?\)|[A-Z])\+\'_\'\)" pat_str2 = r"\([^()]+\)\[[A-Z]\]\[[A-Z]\]" pat_str3 = r"(?<=;)([^+]+)\+=([^;]+)" puzzle = re.sub(pat_str1, lambda x: "(str((%s))+'_')" % x.group(1), puzzle) puzzle = re.sub(pat_str2, "'function'", puzzle) puzzle = re.sub( pat_str3, lambda x: "%s=%s+%s" % (x.group(1), x.group(1), x.group(2)), puzzle) codeGlb = {} code = puzzle.split(';') code.pop() code[0] = code[0][:2] + "'undefined'" for linea in code[:-1]: linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) varss = re.split(r"(?<=[_a-zA-Z\]])=(?=[^=])", linea) value = eval(varss.pop(), codeGlb) for var in varss: m = re.match(r"([^\[]+)\[([^\]]+)\]", var) if m: var, key = m.groups() key = eval(key, codeGlb) codeGlb[var][key] = value else: codeGlb[var] = value linea = code[-1] linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) linea = re.sub(r"\([oc]\^_\^o\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) while re.search(r"\([^)\]'\[(]+\)", linea): linea = re.sub(r"\([^)\]'\[(]+\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"[A-Z](?=[^\]\[])", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"E\[[\'_A-Z]+\]", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = linea.replace('+', '') linea = linea.decode('unicode-escape') m = re.search(r'http.+?true', linea) urlStr = basicFunc.openUrl(m.group(), True) urlStr = '%s|%s' % (m.group(), urllib.urlencode({'User-Agent': MOBILE_BROWSER})) return urlStr
def openloadORIG(videoId, encHeaders = ''): headers = {'User-Agent':MOBILE_BROWSER} encodeHeaders = urllib.urlencode(headers) urlStr = 'https://openload.co/embed/%s/<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] varTags = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' pattern = r'(?#<video script.*=puzzle>)' puzzle = CustomRegEx.findall(pattern, content)[0] vars = sorted(set(re.findall(r'\(([^=)(]+)\) *=', puzzle))) keys1 = re.findall(r', *(?P<key>[^: ]+) *:', puzzle) keys2 = re.findall(r"\(゚Д゚\) *\[[^']+\] *=", puzzle) keys = sorted(set(keys1 + keys2)) totVars = vars + keys for k in range(len(vars)): puzzle = puzzle.replace(vars[k], varTags[k]) for k in range(len(keys)): puzzle = puzzle.replace(keys[k], varTags[-k - 1]) # puzzle = puzzle.replace('\xef\xbe\x89'.decode('utf-8'), '').replace(' ','') puzzle = re.sub(r'[ \x80-\xff]','',puzzle) pat_dicId = r'\(([A-Z])\)={' m = re.search(pat_dicId, puzzle) assert m, 'No se encontro Id del diccionario' dicId = m.group(1) # pat_obj = r"\(\(%s\)\+\\'_\\'\)" % dicId dic_pat1 = r"\(\(%s\)\+\'_\'\)" % dicId dic_pat2 = r"\(%s\+([^+)]+)\)" % dicId dic_pat3 = r"\(%s\)\.(.+?)\b" % dicId dic_pat4 = r"(?<=[{,])([^: ]+)(?=:)" puzzle = re.sub(dic_pat1, "'[object object]_'", puzzle) puzzle = re.sub(dic_pat2, lambda x: "('[object object]'+str(%s))" % x.group(1), puzzle) puzzle = re.sub(dic_pat3, lambda x: "(%s)['%s']" % (dicId, x.group(1)), puzzle) puzzle = re.sub(dic_pat4, lambda x: "'%s'" % x.group(1), puzzle) pat_str1 = r"\((\(.+?\)|[A-Z])\+\'_\'\)" pat_str2 = r"\([^()]+\)\[[A-Z]\]\[[A-Z]\]" puzzle = re.sub(pat_str1, lambda x: "(str(%s)+'_')" % x.group(1), puzzle) puzzle = re.sub(pat_str2, "'function'", puzzle) codeGlb = {} code = puzzle.split(';') code.pop() code[0] = code[0][:2] + "'undefined'" # for k, linea in enumerate(code[:-1]): # try: # exec(linea, codeGlb) # except: # print 'Linea %s con errores ' % k, linea # code[k] = linea.split('=')[0] + '=' + "'\\\\'" # print 'Se corrige como ', code[k] # exec(code[k], codeGlb) linea = code[-1] linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) linea = re.sub(r"\([oc]\^_\^o\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) while re.search(r"\([^)\]'\[(]+\)", linea): linea = re.sub(r"\([^)\]'\[(]+\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"[A-Z](?=[^\]\[])", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"E\[[\'_A-Z]+\]", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = linea.replace('+', '') linea = linea.decode('unicode-escape') m = re.search(r'http.+?true', linea) urlStr = '%s|%s' % (m.group(),encodeHeaders) return urlStr
def openload(videoId, headers = None): headers = headers or {} headers['User-Agent'] = MOBILE_BROWSER encodeHeaders = urllib.urlencode(headers) urlStr = 'https://openload.co/embed/%s/<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] varTags = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' pattern = r'(?#<video script.*=puzzle>)' puzzle = CustomRegEx.findall(pattern, content)[0] vars = sorted(set(re.findall(r'\(([^=)(]+)\) *=', puzzle))) keys1 = re.findall(r', *(?P<key>[^: ]+) *:', puzzle) keys2 = re.findall(r"\(゚Д゚\) *\[[^']+\] *=", puzzle) keys = sorted(set(keys1 + keys2)) totVars = vars + keys for k in range(len(vars)): puzzle = puzzle.replace(vars[k], varTags[k]) for k in range(len(keys)): puzzle = puzzle.replace(keys[k], varTags[-k - 1]) # puzzle = puzzle.replace('\xef\xbe\x89'.decode('utf-8'), '').replace(' ','') puzzle = re.sub(r'[ \x80-\xff]','',puzzle) pat_dicId = r'\(([A-Z])\)={' m = re.search(pat_dicId, puzzle) assert m, 'No se encontro Id del diccionario' dicId = m.group(1) # pat_obj = r"\(\(%s\)\+\\'_\\'\)" % dicId dic_pat1 = r"\(\(%s\)\+\'_\'\)" % dicId dic_pat2 = r"\(%s\+([^+)]+)\)" % dicId dic_pat3 = r"\(%s\)\.(.+?)\b" % dicId dic_pat4 = r"(?<=[{,])([^: ]+)(?=:)" puzzle = re.sub(dic_pat1, "'[object object]_'", puzzle) puzzle = re.sub(dic_pat2, lambda x: "('[object object]'+str((%s)))" % x.group(1), puzzle) puzzle = re.sub(dic_pat3, lambda x: "(%s)['%s']" % (dicId, x.group(1)), puzzle) puzzle = re.sub(dic_pat4, lambda x: "'%s'" % x.group(1), puzzle) pat_str1 = r"\((\(.+?\)|[A-Z])\+\'_\'\)" pat_str2 = r"\([^()]+\)\[[A-Z]\]\[[A-Z]\]" pat_str3 = r"(?<=;)([^+]+)\+=([^;]+)" puzzle = re.sub(pat_str1, lambda x: "(str((%s))+'_')" % x.group(1), puzzle) puzzle = re.sub(pat_str2, "'function'", puzzle) puzzle = re.sub(pat_str3, lambda x: "%s=%s+%s" % (x.group(1), x.group(1), x.group(2)), puzzle) codeGlb = {} code = puzzle.split(';') code.pop() code[0] = code[0][:2] + "'undefined'" for linea in code[:-1]: linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) varss = re.split(r"(?<=[_a-zA-Z\]])=(?=[^=])",linea) value = eval(varss.pop(), codeGlb) for var in varss: m = re.match(r"([^\[]+)\[([^\]]+)\]", var) if m: var, key = m.groups() key = eval(key, codeGlb) codeGlb[var][key] = value else: codeGlb[var] = value linea = code[-1] linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) linea = re.sub(r"\([oc]\^_\^o\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) while re.search(r"\([^)\]'\[(]+\)", linea): linea = re.sub(r"\([^)\]'\[(]+\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"[A-Z](?=[^\]\[])", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"E\[[\'_A-Z]+\]", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = linea.replace('+', '') linea = linea.decode('unicode-escape') m = re.search(r'http.+?true', linea) urlStr = basicFunc.openUrl(m.group(), True) urlStr = '%s|%s' % (m.group(),urllib.urlencode({'User-Agent':MOBILE_BROWSER})) return urlStr
def openloadORIG(videoId, encHeaders=''): headers = {'User-Agent': MOBILE_BROWSER} encodeHeaders = urllib.urlencode(headers) urlStr = 'https://openload.co/embed/%s/<headers>%s' % (videoId, encodeHeaders) content = basicFunc.openUrl(urlStr)[1] varTags = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' pattern = r'(?#<video script.*=puzzle>)' puzzle = CustomRegEx.findall(pattern, content)[0] vars = sorted(set(re.findall(r'\(([^=)(]+)\) *=', puzzle))) keys1 = re.findall(r', *(?P<key>[^: ]+) *:', puzzle) keys2 = re.findall(r"\(゚Д゚\) *\[[^']+\] *=", puzzle) keys = sorted(set(keys1 + keys2)) totVars = vars + keys for k in range(len(vars)): puzzle = puzzle.replace(vars[k], varTags[k]) for k in range(len(keys)): puzzle = puzzle.replace(keys[k], varTags[-k - 1]) # puzzle = puzzle.replace('\xef\xbe\x89'.decode('utf-8'), '').replace(' ','') puzzle = re.sub(r'[ \x80-\xff]', '', puzzle) pat_dicId = r'\(([A-Z])\)={' m = re.search(pat_dicId, puzzle) assert m, 'No se encontro Id del diccionario' dicId = m.group(1) # pat_obj = r"\(\(%s\)\+\\'_\\'\)" % dicId dic_pat1 = r"\(\(%s\)\+\'_\'\)" % dicId dic_pat2 = r"\(%s\+([^+)]+)\)" % dicId dic_pat3 = r"\(%s\)\.(.+?)\b" % dicId dic_pat4 = r"(?<=[{,])([^: ]+)(?=:)" puzzle = re.sub(dic_pat1, "'[object object]_'", puzzle) puzzle = re.sub(dic_pat2, lambda x: "('[object object]'+str(%s))" % x.group(1), puzzle) puzzle = re.sub(dic_pat3, lambda x: "(%s)['%s']" % (dicId, x.group(1)), puzzle) puzzle = re.sub(dic_pat4, lambda x: "'%s'" % x.group(1), puzzle) pat_str1 = r"\((\(.+?\)|[A-Z])\+\'_\'\)" pat_str2 = r"\([^()]+\)\[[A-Z]\]\[[A-Z]\]" puzzle = re.sub(pat_str1, lambda x: "(str(%s)+'_')" % x.group(1), puzzle) puzzle = re.sub(pat_str2, "'function'", puzzle) codeGlb = {} code = puzzle.split(';') code.pop() code[0] = code[0][:2] + "'undefined'" # for k, linea in enumerate(code[:-1]): # try: # exec(linea, codeGlb) # except: # print 'Linea %s con errores ' % k, linea # code[k] = linea.split('=')[0] + '=' + "'\\\\'" # print 'Se corrige como ', code[k] # exec(code[k], codeGlb) linea = code[-1] linea = re.sub(r"\(([A-Z]+)\)", lambda x: x.group(1), linea) linea = re.sub(r"\([oc]\^_\^o\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) while re.search(r"\([^)\]'\[(]+\)", linea): linea = re.sub(r"\([^)\]'\[(]+\)", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"[A-Z](?=[^\]\[])", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = re.sub(r"E\[[\'_A-Z]+\]", lambda x: "%s" % eval(x.group(), codeGlb), linea) linea = linea.replace('+', '') linea = linea.decode('unicode-escape') m = re.search(r'http.+?true', linea) urlStr = '%s|%s' % (m.group(), encodeHeaders) return urlStr
def getMenuHeaderFooter(param, args, data, menus): htmlUnescape = HTMLParser.HTMLParser().unescape menuId = args.get('menu', ['rootmenu'])[0] url = args.get("url")[0] headerFooter = [] for k, elem in enumerate(menus): opLabel, opregexp = elem opdefault, sep, opvalues = opregexp.partition('|') opvalues = opvalues or opdefault opdefault = opdefault if sep else '' pIni, pFin = 0, -1 if opdefault.startswith('(?#<SPAN>)'): pIni, match = -1, CustomRegEx.search(opdefault, data) if match: pIni, pFin = match.span(0) opmenu = CustomRegEx.findall(opvalues, data[pIni:pFin]) if not opmenu: continue tags = CustomRegEx.compile(opvalues).groupindex.keys() if 'url' in tags: menuUrl = [htmlUnescape(elem[tags.index('url')]) for elem in opmenu] if len(tags) > 1 else [htmlUnescape(opmenu[0].replace('\/', '/'))] if 'label' in tags: menuLabel = map(htmlUnescape, [elem[tags.index('label')] for elem in opmenu]) else: placeHolder = 'Next >>>' if param == 'footer' else 'Header >>>' menuLabel = len(opmenu)*[placeHolder] if len(opmenu) == 1: opLabel = menuLabel[0] if 'varvalue' in tags: varValue = [elem[tags.index('varvalue')] for elem in opmenu] if len(tags) > 1 else opmenu if opdefault: cmpregex = CustomRegEx.compile(opdefault) tags = cmpregex.groupindex.keys() match = cmpregex.search(data) if tags: if 'label' in tags: opdefault = htmlUnescape(match.group(1) if match else '') elif 'defvalue' in tags: opdefault = htmlUnescape(match.group('defvalue')) elif 'varname' in tags: varName = match.group('varname') urlquery = urlparse.urlsplit(url).query queryDict = dict(urlparse.parse_qsl(urlquery)) opdefault = queryDict.get(varName, '') try: indx = varValue.index(opdefault) except: opdefault = '' else: opdefault = menuLabel[indx] menuUrl = [] for elem in varValue: queryDict[varName] = elem menuUrl.append('?' + urllib.urlencode(queryDict)) else: opdefault = htmlUnescape(match.group(1) if match else '') paramDict = dict([(key, value[0]) for key, value in args.items() if hasattr(value, "__getitem__") and key not in ["header", "footer"]]) paramDict.update({'section':param, 'url':url, param:k, 'menu':menuId}) paramDict['menulabel'] = base64.urlsafe_b64encode(str(menuLabel)) paramDict['menuurl'] = base64.urlsafe_b64encode(str(menuUrl)) label = '[COLOR yellow]' + opLabel + opdefault + '[/COLOR]' itemParam = {'isFolder':True, 'label':label} headerFooter.append([paramDict, itemParam, None]) return headerFooter
def getMenuHeaderFooter(param, args, data, menus): htmlUnescape = HTMLParser.HTMLParser().unescape menuId = args.get('menu', ['rootmenu'])[0] url = args.get("url")[0] headerFooter = [] for k, elem in enumerate(menus): opLabel, opregexp = elem opdefault, sep, opvalues = opregexp.partition('|') opvalues = opvalues or opdefault opdefault = opdefault if sep else '' pIni, pFin = 0, -1 if opdefault.startswith('(?#<SPAN>)'): pIni, match = -1, CustomRegEx.search(opdefault, data) if match: pIni, pFin = match.span(0) opmenu = CustomRegEx.findall(opvalues, data[pIni:pFin]) if not opmenu: continue tags = CustomRegEx.compile(opvalues).groupindex.keys() if 'url' in tags: menuUrl = [elem[tags.index('url')] for elem in opmenu] if len(tags) > 1 else opmenu[0] if 'label' in tags: menuLabel = map(htmlUnescape, [elem[tags.index('label')] for elem in opmenu]) else: placeHolder = 'Next >>>' if param == 'footer' else 'Header >>>' menuLabel = len(menuUrl)*[placeHolder] if len(opmenu) == 1: opLabel = menuLabel[0] if 'varvalue' in tags: varValue = [elem[tags.index('varvalue')] for elem in opmenu] if len(tags) > 1 else opmenu if opdefault: cmpregex = CustomRegEx.compile(opdefault) tags = cmpregex.groupindex.keys() match = cmpregex.search(data) if tags: if 'label' in tags: opdefault = htmlUnescape(match.group(1) if match else '') elif 'defvalue' in tags: opdefault = htmlUnescape(match.group('defvalue')) elif 'varname' in tags: varName = match.group('varname') urlquery = urlparse.urlsplit(url).query queryDict = dict(urlparse.parse_qsl(urlquery)) opdefault = queryDict.get(varName, '') try: indx = varValue.index(opdefault) except: opdefault = '' else: opdefault = menuLabel[indx] menuUrl = [] for elem in varValue: queryDict[varName] = elem menuUrl.append('?' + urllib.urlencode(queryDict)) paramDict = dict([(key, value[0]) for key, value in args.items() if hasattr(value, "__getitem__") and key not in ["header", "footer"]]) paramDict.update({'section':param, 'url':url, param:k, 'menu':menuId}) paramDict['menulabel'] = base64.urlsafe_b64encode(str(menuLabel)) paramDict['menuurl'] = base64.urlsafe_b64encode(str(menuUrl)) label = '[COLOR yellow]' + opLabel + opdefault + '[/COLOR]' itemParam = {'isFolder':True, 'label':label} headerFooter.append([paramDict, itemParam, None]) return headerFooter