def loadTerms(): global LOADEDTERMS if not LOADEDTERMS: LOADEDTERMS = True print("Loading triples files") SdoTermSource.loadSourceGraph("default") print ("loaded %s triples - %s terms" % (len(SdoTermSource.sourceGraph()),len(SdoTermSource.getAllTerms())) )
def test_booleanDataType(self): self.assertTrue( SdoTermSource.getTerm("Boolean").termType == SdoTerm.DATATYPE) self.assertTrue( SdoTermSource.getTerm("DataType").termType == SdoTerm.DATATYPE) self.assertFalse( SdoTermSource.getTerm("Thing").termType == SdoTerm.DATATYPE) self.assertFalse( SdoTermSource.getTerm("Duration").termType == SdoTerm.DATATYPE)
def test_alltypes(self): # ballpark estimates. self.assertTrue( len(SdoTermSource.getAllTypes()) > TYPECOUNT_LOWERBOUND, "Should be > %d types. Got %s" % (TYPECOUNT_LOWERBOUND, len(SdoTermSource.getAllTypes()))) self.assertTrue( len(SdoTermSource.getAllTypes()) < TYPECOUNT_UPPERBOUND, "Should be < %d types. Got %s" % (TYPECOUNT_UPPERBOUND, len(SdoTermSource.getAllTypes())))
def test_alumniSuperproperty(self): p_alumni = SdoTermSource.getTerm("alumni") p_suggestedAnswer = SdoTermSource.getTerm("suggestedAnswer") self.assertFalse("alumni" in p_suggestedAnswer.supers, "not suggestedAnswer subPropertyOf alumni.") self.assertFalse("suggestedAnswer" in p_alumni.supers, "not alumni subPropertyOf suggestedAnswer.") self.assertFalse("alumni" in p_alumni.supers, "not alumni subPropertyOf alumni.") self.assertFalse("alumniOf" in p_alumni.supers, "not alumni subPropertyOf alumniOf.") self.assertFalse("suggestedAnswer" in p_suggestedAnswer.supers, "not suggestedAnswer subPropertyOf suggestedAnswer.")
def schemasPage(page): extra_vars = { 'home_page': "False", 'title': SITENAME + ' - Schemas', 'termcounts': SdoTermSource.termCounts() } return docsTemplateRender("docs/Schemas.j2", extra_vars)
def _jsonldtree(tid, term=None): termdesc = SdoTermSource.getTerm(tid) if not term: term = {} term['@type'] = "rdfs:Class" term['@id'] = "schema:" + termdesc.id term['name'] = termdesc.label if termdesc.supers: sups = [] for sup in termdesc.supers: sups.append("schema:" + sup) if len(sups) == 1: term['rdfs:subClassOf'] = sups[0] else: term['rdfs:subClassOf'] = sups term['description'] = ShortenOnSentence(StripHtmlTags(termdesc.comment)) if termdesc.pending: term['pending'] = True if termdesc.retired: term['attic'] = True if tid not in VISITLIST: VISITLIST.append(tid) if termdesc.subs: subs = [] for sub in termdesc.subs: subs.append(_jsonldtree(sub)) term['children'] = subs return term
def __init__(self, term, depth=0, title="", parent=None): global VISITLIST termdesc = SdoTermSource.getTerm(term) if parent == None: VISITLIST = [] self.repeat = False self.subs = [] self.parent = parent self.title = title self.id = termdesc.label self.termType = termdesc.termType self.depth = depth self.retired = termdesc.retired self.pending = termdesc.pending if not self.id in VISITLIST: VISITLIST.append(self.id) if termdesc.termType == SdoTerm.ENUMERATION: for enum in sorted(termdesc.enumerationMembers): self.subs.append( listingNode(enum, depth=depth + 1, parent=self)) for sub in sorted(termdesc.subs): self.subs.append(listingNode(sub, depth=depth + 1, parent=self)) else: #Visited this node before so don't parse children self.repeat = True
def sitemap(page): node = """ <url> <loc>https://schema.org/%s</loc> <lastmod>%s</lastmod> </url> """ STATICPAGES = [ "docs/schemas.html", "docs/full.html", "docs/gs.html", "docs/about.html", "docs/howwework.html", "docs/releases.html", "docs/faq.html", "docs/datamodel.html", "docs/developers.html", "docs/extension.html", "docs/meddocs.html", "docs/hotels.html" ] output = [] output.append("""<?xml version="1.0" encoding="utf-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> """) terms = SdoTermSource.getAllTerms(supressSourceLinks=True) ver = getVersionDate(getVersion()) for term in terms: if not (term.startswith("http://") or term.startswith("https://")): output.append(node % (term, ver)) for term in STATICPAGES: output.append(node % (term, ver)) output.append("</urlset>\n") return "".join(output)
def buildequivs(format): s_p = "http://schema.org/" s_s = "https://schema.org/" outGraph = rdflib.Graph() outGraph.bind("schema_p", s_p) outGraph.bind("schema_s", s_s) outGraph.bind("owl", OWL) for t in SdoTermSource.getAllTerms(expanded=True): if not t.retired: #drops non-schema terms and those in attic eqiv = OWL.equivalentClass if t.termType == SdoTerm.PROPERTY: eqiv = OWL.equivalentProperty p = URIRef(s_p + t.id) s = URIRef(s_s + t.id) outGraph.add((p, eqiv, s)) outGraph.add((s, eqiv, p)) #log.info("%s " % t.uri) for ftype in exts: if format != "all" and format != ftype: continue ext = exts[ftype] kwargs = {'sort_keys': True} format = ftype if format == "rdf": format = "pretty-xml" return outGraph.serialize(format=format, auto_compact=True, **kwargs)
def exportcsv(page): protocol, altprotocol = protocols() typeFields = [ "id", "label", "comment", "subTypeOf", "enumerationtype", "equivalentClass", "properties", "subTypes", "supersedes", "supersededBy", "isPartOf" ] propFields = [ "id", "label", "comment", "subPropertyOf", "equivalentProperty", "subproperties", "domainIncludes", "rangeIncludes", "inverseOf", "supersedes", "supersededBy", "isPartOf" ] typedata = [] typedataAll = [] propdata = [] propdataAll = [] terms = SdoTermSource.getAllTerms(expanded=True, supressSourceLinks=True) for term in terms: if term.termType == SdoTerm.REFERENCE or term.id.startswith( "http://") or term.id.startswith("https://"): continue row = {} row["id"] = term.uri row["label"] = term.label row["comment"] = term.comment row["supersedes"] = uriwrap(term.supersedes) row["supersededBy"] = uriwrap(term.supersededBy) #row["isPartOf"] = term.isPartOf row["isPartOf"] = "" if term.termType == SdoTerm.PROPERTY: row["subPropertyOf"] = uriwrap(term.supers) row["equivalentProperty"] = array2str(term.equivalents) row["subproperties"] = uriwrap(term.subs) row["domainIncludes"] = uriwrap(term.domainIncludes) row["rangeIncludes"] = uriwrap(term.rangeIncludes) row["inverseOf"] = uriwrap(term.inverse) propdataAll.append(row) if not term.retired: propdata.append(row) else: row["subTypeOf"] = uriwrap(term.supers) if term.termType == SdoTerm.ENUMERATIONVALUE: row["enumerationtype"] = uriwrap(term.enumerationParent) else: row["properties"] = uriwrap(term.allproperties) row["equivalentClass"] = array2str(term.equivalents) row["subTypes"] = uriwrap(term.subs) typedataAll.append(row) if not term.retired: typedata.append(row) writecsvout("properties", propdata, propFields, "current", protocol, altprotocol) writecsvout("properties", propdataAll, propFields, "all", protocol, altprotocol) writecsvout("types", typedata, typeFields, "current", protocol, altprotocol) writecsvout("types", typedataAll, typeFields, "all", protocol, altprotocol)
def fullReleasePage(page): listings = [] listings.append(listingNode("Thing", title="Type hierarchy")) types = SdoTermSource.getAllEnumerationvalues(expanded=True) types.extend(SdoTermSource.getAllTypes(expanded=True)) types = SdoTermSource.expandTerms(types) types = sorted(types, key=lambda t: t.id) extra_vars = { 'home_page': "False", 'title': "Full Release Summary", 'version': getVersion(), 'date': getCurrentVersionDate(), 'listings': listings, 'types': types, 'properties': SdoTermSource.getAllProperties(expanded=True) } return docsTemplateRender("docs/FullRelease.j2", extra_vars)
def protocols(): vocaburi = SdoTermSource.vocabUri() protocol = "http" altprotocol = "https" if vocaburi.startswith("https"): protocol = "https" altprotocol = "http" return protocol, altprotocol
def test_acceptedAnswerSuperpropertiesArrayLen(self): p_acceptedAnswer = SdoTermSource.getTerm("acceptedAnswer") aa_supers = p_acceptedAnswer.supers #for f in aa_supers: #log.info("acceptedAnswer's subproperties(): %s" % f) self.assertTrue( len(aa_supers) == 1, "acceptedAnswer subproperties() gives array of len 1. Actual: %s ." % len(aa_supers))
def test_alumniInverse(self): p_alumni = SdoTermSource.getTerm("alumni") p_alumniOf = SdoTermSource.getTerm("alumniOf") p_suggestedAnswer = SdoTermSource.getTerm("suggestedAnswer") #log.info("alumni: " + str(p_alumniOf.getInverseOf() )) self.assertTrue("alumni" == p_alumniOf.inverse, "alumniOf inverseOf alumni.") self.assertTrue("alumniOf" == p_alumni.inverse, "alumni inverseOf alumniOf.") self.assertFalse("alumni" == p_alumni.inverse, "Not alumni inverseOf alumni.") self.assertFalse("alumniOf" == p_alumniOf.inverse, "Not alumniOf inverseOf alumniOf.") self.assertFalse("alumni" == p_suggestedAnswer.inverse, "Not answer inverseOf alumni.")
def test_gotFooBarThing(self): foobar = SdoTermSource.getTerm("FooBar") if foobar is None: gotFooBar = False else: gotFooBar = True self.assertEqual( gotFooBar, False, "Thing node should NOT be accessible via GetUnit('FooBar').")
def test_gotThing(self): thing = SdoTermSource.getTerm("Thing") if thing is None: gotThing = False else: gotThing = True self.assertEqual( gotThing, True, "Thing node should be accessible via GetUnit('Thing').")
def buildTerms(terms): all = ["ALL", "All", "all"] for a in all: if a in terms: terms = SdoTermSource.getAllTerms(supressSourceLinks=True) break import time, datetime start = datetime.datetime.now() lastCount = 0 if len(terms): print("\nBuilding term pages...\n") for t in terms: tic = datetime.datetime.now() #diagnostics term = SdoTermSource.getTerm(t, expanded=True) if not term: print("No such term: %s\n" % t) continue if term.termType == SdoTerm.REFERENCE: #Don't create pages for reference types continue examples = SchemaExamples.examplesForTerm(term.id) pageout = termtemplateRender(term, examples) f = open(termFileName(term.id), "w") f.write(pageout) f.close() #diagnostics ########################################## termsofar = len(SdoTermSource.termCache()) #diagnostics termscreated = termsofar - lastCount #diagnostics lastCount = termsofar #diagnostics print("Term: %s (%d) - %s" % (t, termscreated, str(datetime.datetime.now() - tic))) #diagnostics # Note: (%d) = number of individual newly created (not cached) term definitions to # build this expanded definition. ie. All Properties associated with a Type, etc. if len(terms): print() print("All terms took %s seconds" % str(datetime.datetime.now() - start)) #diagnostics
def termtemplateRender(term,examples): #Basic varibles configuring UI extra_vars = { 'title': term.label, 'menu_sel': "Schemas", 'home_page': "False", 'docsdir': TERMDOCSDIR, 'term': term, 'jsonldPayload': SdoTermSource.getTermAsRdfString(term.id,"json-ld", full=True), 'examples': examples } return templateRender("terms/TermPage.j2",extra_vars)
def homePage(page): global STRCLASSVAL title = SITENAME template = "docs/Home.j2" filt = None overrideclassval = None if page == "PendingHome": title += " - Pending" template = "docs/PendingHome.j2" filt = "pending" overrideclassval = 'class="ext ext-pending"' elif page == "AtticHome": title += " - Retired" template = "docs/AtticHome.j2" filt = "attic" overrideclassval = 'class="ext ext-attic"' sectionterms = {} termcount = 0 if filt: terms = SdoTermSource.getAllTerms(layer=filt, expanded=True) terms.sort(key=lambda u: (u.category, u.id)) first = True cat = None for t in terms: if first or t.category != cat: first = False cat = t.category ttypes = {} sectionterms[cat] = ttypes ttypes[SdoTerm.TYPE] = [] ttypes[SdoTerm.PROPERTY] = [] ttypes[SdoTerm.DATATYPE] = [] ttypes[SdoTerm.ENUMERATION] = [] ttypes[SdoTerm.ENUMERATIONVALUE] = [] if t.termType == SdoTerm.REFERENCE: continue ttypes[t.termType].append(t) termcount += 1 extra_vars = { 'home_page': "True", 'title': SITENAME, 'termcount': termcount, 'sectionterms': sectionterms } STRCLASSVAL = overrideclassval ret = docsTemplateRender(template, extra_vars) STRCLASSVAL = None return ret
def exportrdf(exportType): global allGraph, currentGraph if not allGraph: allGraph = rdflib.Graph() allGraph.bind("schema",VOCABURI) currentGraph = rdflib.Graph() currentGraph.bind("schema",VOCABURI) allGraph += SdoTermSource.sourceGraph() protocol, altprotocol = protocols() deloddtriples = """DELETE {?s ?p ?o} WHERE { ?s ?p ?o. FILTER (! strstarts(str(?s), "%s://schema.org") ). }""" % (protocol) allGraph.update(deloddtriples) currentGraph += allGraph desuperseded="""PREFIX schema: <%s://schema.org/> DELETE {?s ?p ?o} WHERE{ ?s ?p ?o; schema:supersededBy ?sup. }""" % (protocol) #Currenty superseded terms are not suppressed from 'current' file dumps #Whereas they are suppressed from the UI #currentGraph.update(desuperseded) delattic="""PREFIX schema: <%s://schema.org/> DELETE {?s ?p ?o} WHERE{ ?s ?p ?o; schema:isPartOf <%s://attic.schema.org>. }""" % (protocol,protocol) currentGraph.update(delattic) formats = ["json-ld", "turtle", "nt", "nquads", "rdf"] extype = exportType[len("RDFExport."):] if exportType == "RDFExports": for format in sorted(formats): _exportrdf(format,allGraph,currentGraph) elif extype in formats: _exportrdf(extype,allGraph,currentGraph) else: raise Exception("Unknown export format: %s" % exportType)
def test_zeroCommentCount(self): query = """ SELECT ?term ?comment WHERE { ?term a ?type. FILTER NOT EXISTS { ?term rdfs:comment ?comment. } FILTER (strStarts(str(?term),"%s")) } ORDER BY ?term""" % VOCABURI ndi1_results = SdoTermSource.query(query) if (len(ndi1_results) > 0): for row in ndi1_results: log.info("WARNING term %s has no rdfs:comment value" % (row["term"])) self.assertEqual( len(ndi1_results), 0, "Found: %s term(s) without comment value" % len(ndi1_results))
def test_multiCommentCount(self): query = """ SELECT ?term ?comment WHERE { ?term a ?type; rdfs:comment ?comment. FILTER (strStarts(str(?term),"%s")) } GROUP BY ?term HAVING (count(DISTINCT ?comment) > 1) ORDER BY ?term""" % VOCABURI ndi1_results = SdoTermSource.query(query) if (len(ndi1_results) > 0): for row in ndi1_results: log.info("WARNING term %s has rdfs:comment value %s" % (row["term"], row["comment"])) self.assertEqual( len(ndi1_results), 0, "Found: %s term(s) without multiple comment values" % len(ndi1_results))
def test_inverseDualPath(self): self.assertEqual( len(SdoTermSource.getParentPathTo("Thing", "Restaurant")), 0, "0 supertype paths from Thing to Restaurant.")
def test_dualPath(self): self.assertEqual( len(SdoTermSource.getParentPathTo("Restaurant", "Thing")), 2, "2 supertype paths from Restaurant to Thing.")
def test_simplePath(self): self.assertEqual( len(SdoTermSource.getParentPathTo("CreativeWork", "Thing")), 1, "1 supertype path from CreativeWork to Thing.")
def test_article_non_multiple_supertypes(self): fred = SdoTermSource.getTerm("Article") self.assertFalse( len(fred.supers) > 1, "Article only has one direct supertype.")
def test_restaurant_non_multiple_supertypes(self): fred = SdoTermSource.getTerm("Restaurant") self.assertFalse( len(fred.supers) > 1, "Restaurant only has one *direct* supertype.")
def test_localbusiness2supertypes(self): fred = SdoTermSource.getTerm("LocalBusiness") self.assertTrue( len(fred.supers) > 1, "LocalBusiness is subClassOf Place + Organization.")
def test_EventCancelledIsEnumerationValue(self): eEventCancelled = SdoTermSource.getTerm("EventCancelled") self.assertTrue(eEventCancelled.termType == SdoTerm.ENUMERATIONVALUE, "EventCancelled is an Enumeration value.")
def test_EventStatusTypeIsntEnumerationValue(self): eEventStatusType = SdoTermSource.getTerm("EventStatusType") self.assertFalse(eEventStatusType.termType == SdoTerm.ENUMERATIONVALUE, "EventStatusType is not an Enumeration value.")