def linkify(html): links = set() from .page import Page def clean_link(attrs, new=False): if (None, "href") not in attrs: return None href = attrs[(None, "href")] thread_title = get_thread_title(href) if thread_title is None: _fnm, ext = os.path.splitext(urlparse(href).path) if ext.lower() in img_exts: attrs["_text"] = sanitize_html("<img src='{}'>".format(href)) else: attrs["_text"] = sanitize_text(href) else: try: # TODO: improve performance here page = Page.find(thread_title) attrs["_text"] = sanitize_text(page.name) except PageNotFound: attrs["_text"] = sanitize_text(title_to_name(thread_title)) links.add(thread_title) return attrs linker = bleach.Linker( callbacks=[clean_link], url_re=bleach.linkifier.build_url_re(tlds=["[a-z]+"])) return links, normalize(linker.linkify(html))
def rich_text(text: str, excluded_tags=""): """ Processes markdown and cleans HTML in a text input. """ text = str(text) linker = bleach.Linker(parse_email=True) body_md = linker.linkify(markdown_compile(text, excluded_tags=excluded_tags)) return mark_safe(body_md)
def rich_text_snippet(text: str, **kwargs): """ Processes markdown and cleans HTML in a text input. """ text = str(text) linker = bleach.Linker( url_re=URL_RE, email_re=EMAIL_RE, callbacks=DEFAULT_CALLBACKS + ([truelink_callback, safelink_callback] if kwargs.get('safelinks', True) else [truelink_callback, abslink_callback]), parse_email=True ) body_md = linker.linkify(markdown_compile(text, snippet=True)) return mark_safe(body_md)
def markdown_compile_email(source): linker = bleach.Linker(url_re=URL_RE, email_re=EMAIL_RE, parse_email=True) return linker.linkify( bleach.clean( markdown.markdown( source, extensions=[ 'markdown.extensions.sane_lists', # 'markdown.extensions.nl2br' # disabled for backwards-compatibility ]), tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES, protocols=ALLOWED_PROTOCOLS, ))