def condense(self): """ :return: """ # MAKE NEW SHARD partition = JoinSQL( SQL_COMMA, [ quote_column(c.es_field) for f in listwrap(self.id.field) for c in self.flake.leaves(f) ], ) order_by = JoinSQL( SQL_COMMA, [ ConcatSQL(quote_column(c.es_field), SQL_DESC) for f in listwrap(self.id.version) for c in self.flake.leaves(f) ], ) # WRAP WITH etl.timestamp BEST SELECTION self.container.query_and_wait( ConcatSQL( SQL( # SOME KEYWORDS: ROWNUM RANK "SELECT * EXCEPT (_rank) FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY " ), partition, SQL_ORDERBY, order_by, SQL(") AS _rank FROM "), quote_column(self.full_name), SQL(") a WHERE _rank=1"), ))
def quote_value(value): """ convert values to mysql code for the same mostly delegate directly to the mysql lib, but some exceptions exist """ try: if value == None: return SQL_NULL elif isinstance(value, SQL): return value elif is_text(value): return SQL("'" + "".join(ESCAPE_DCT.get(c, c) for c in value) + "'") elif is_data(value): return quote_value(json_encode(value)) elif isinstance(value, datetime): return SQL("str_to_date('" + value.strftime("%Y%m%d%H%M%S.%f") + "', '%Y%m%d%H%i%s.%f')") elif isinstance(value, Date): return SQL("str_to_date('" + value.format("%Y%m%d%H%M%S.%f") + "', '%Y%m%d%H%i%s.%f')") elif is_number(value): return SQL(text(value)) elif hasattr(value, '__iter__'): return quote_value(json_encode(value)) else: return quote_value(text(value)) except Exception as e: Log.error("problem quoting SQL {{value}}", value=repr(value), cause=e)
def sql_create(table, properties, primary_key=None, unique=None): """ :param table: NAME OF THE TABLE TO CREATE :param properties: DICT WITH {name: type} PAIRS (type can be plain text) :param primary_key: COLUMNS THAT MAKE UP THE PRIMARY KEY :param unique: COLUMNS THAT SHOULD BE UNIQUE :return: """ acc = [ SQL_CREATE, quote_column(table), SQL_OP, sql_list([quote_column(k) + SQL(v) for k, v in properties.items()]), ] if primary_key: acc.append(SQL_COMMA), acc.append(SQL(" PRIMARY KEY ")), acc.append( sql_iso(sql_list([quote_column(c) for c in listwrap(primary_key)]))) if unique: acc.append(SQL_COMMA), acc.append(SQL(" UNIQUE ")), acc.append( sql_iso(sql_list([quote_column(c) for c in listwrap(unique)]))) acc.append(SQL_CP) return ConcatSQL(*acc)
def with_var(var, expression, eval): """ :param var: NAME GIVEN TO expression :param expression: THE EXPRESSION TO COMPUTE FIRST :param eval: THE EXPRESSION TO COMPUTE SECOND, WITH var ASSIGNED :return: PYTHON EXPRESSION """ return ConcatSQL(SQL("WITH x AS (SELECT ("), expression, SQL(") AS "), var, SQL(") SELECT "), eval, SQL(" FROM x"))
def to_sql(self, schema, not_null=False, boolean=False): value = SQLang[self.value].partial_eval().to_sql(schema)[0].sql.s find = SQLang[self.find].partial_eval().to_sql(schema)[0].sql.s start = SQLang[self.start].partial_eval().to_sql(schema)[0].sql.n default = coalesce( SQLang[self.default].partial_eval().to_sql(schema)[0].sql.n, SQL_NULL) if start.sql != SQL_ZERO.sql: value = NotRightOp([self.value, self.start]).to_sql(schema)[0].sql.s index = sql_call("INSTR", value, find) i = quote_column("i") sql = with_var( i, index, ConcatSQL( SQL_CASE, SQL_WHEN, i, SQL_THEN, i, SQL(" - "), SQL_ONE, SQL_PLUS, start, SQL_ELSE, default, SQL_END, ), ) return wrap([{"name": ".", "sql": {"n": sql}}])
def to_bq(self, schema, not_null=False, boolean=False): acc = [] for term in self.terms: sqls = BQLang[term].to_bq(schema) if len(sqls) > 1: acc.append(SQL_TRUE) else: for t, v in sqls[0].sql.items(): if t in ["b", "s", "n"]: acc.append( ConcatSQL( SQL_CASE, SQL_WHEN, sql_iso(v), SQL_IS_NULL, SQL_THEN, SQL_ZERO, SQL_ELSE, SQL_ONE, SQL_END, )) else: acc.append(SQL_TRUE) if not acc: return wrap([{}]) else: return wrap([{"nanme": ".", "sql": {"n": SQL("+").join(acc)}}])
def get_subscriber(self, name=None): """ GET SUBSCRIBER BY id, OR BY QUEUE name """ with self.db.transaction() as t: result = t.query( SQL( f""" SELECT MIN(s.id) as id FROM {SUBSCRIBER} AS s LEFT JOIN {QUEUE} as q on q.id = s.queue WHERE q.name = {quote_value(name)} GROUP BY s.queue """ ) ) if not result: Log.error("not expected") queue = self.get_or_create_queue(name) sub_info = t.query( sql_query( {"from": SUBSCRIBER, "where": {"eq": {"id": first_row(result).id}}} ) ) return Subscription(queue=queue, kwargs=first_row(sub_info))
def test_extract_job(complex_job, extract_job_settings): """ If you find this test failing, then copy the JSON in the test failure into the test_extract_job.json file, then you may use the diff to review the changes. """ with MySQL(extract_job_settings.source.database) as source: with MySqlSnowflakeExtractor(extract_job_settings.source) as extractor: sql = extractor.get_sql( SQL("SELECT " + text(complex_job.id) + " as id")) acc = [] with source.transaction(): cursor = list(source.query(sql, stream=True, row_tuples=True)) extractor.construct_docs(cursor, acc.append, False) doc = first(acc) doc.guid = first(JOB).guid # NEW EACH TIME job_guid = first(jx.drill(JOB, "job_log.failure_line.job_guid")) for fl in jx.drill(doc, "job_log.failure_line"): fl.job_guid = job_guid assertAlmostEqual( acc, JOB, places= 4, # TH MIXES LOCAL TIMEZONE WITH GMT: https://bugzilla.mozilla.org/show_bug.cgi?id=1612603 )
def test_extract_alert_sql(extract_alert_settings, test_perf_alert_summary, test_perf_alert): """ If you find this test failing, then replace the contents of test_extract_alerts.sql with the contents of the `sql` variable below. You can then review the resulting diff. """ p = test_perf_alert s2 = PerformanceAlertSummary.objects.create( id=2, repository=test_perf_alert_summary.repository, prev_push_id=3, push_id=4, created=datetime.datetime.now(), framework=test_perf_alert_summary.framework, manually_created=False, ) # set related summary with downstream status, make sure that works # p = PerformanceAlert.objects.get(id=1) p.status = PerformanceAlert.DOWNSTREAM p.related_summary = s2 p.save() extractor = MySqlSnowflakeExtractor(extract_alert_settings.source) sql = extractor.get_sql(SQL("SELECT 0")) assert "".join(sql.sql.split()) == "".join(EXTRACT_ALERT_SQL.split())
def test_extract_alert(extract_alert_settings, test_perf_alert_summary, test_perf_alert): """ If you find this test failing, then copy the JSON in the test failure into the test_extract_alerts.json file, then you may use the diff to review the changes. """ now = datetime.datetime.now() source = MySQL(extract_alert_settings.source.database) extractor = MySqlSnowflakeExtractor(extract_alert_settings.source) sql = extractor.get_sql( SQL("SELECT " + text(test_perf_alert_summary.id) + " as id")) acc = [] with source.transaction(): cursor = list(source.query(sql, stream=True, row_tuples=True)) extractor.construct_docs(cursor, acc.append, False) doc = acc[0] # TEST ARE RUN WITH CURRENT TIMESTAMPS doc.created = now doc.last_updated = now for d in doc.details: d.created = now d.last_updated = now d.series_signature.last_updated = now assertAlmostEqual( acc, ALERT, places=3 ) # TH MIXES LOCAL TIMEZONE WITH GMT: https://bugzilla.mozilla.org/show_bug.cgi?id=1612603
def test_extract_job_sql(extract_job_settings, transactional_db): """ VERIFY SQL OVER DATABASE """ extractor = MySqlSnowflakeExtractor(extract_job_settings.source) sql = extractor.get_sql(SQL("SELECT 0")) assert "".join(sql.sql.split()) == "".join(EXTRACT_JOB_SQL.split())
def create_view(self, view_api_name, shard_api_name): job = self.query_and_wait( ConcatSQL( SQL("CREATE VIEW\n"), quote_column(view_api_name), SQL_AS, sql_query({"from": shard_api_name}), ))
def test_make_failure_class(failure_class, extract_job_settings): # TEST I CAN MAKE AN OBJECT IN THE DATABASE source = MySQL(extract_job_settings.source.database) with source.transaction(): result = source.query(SQL("SELECT * from failure_classification")) # verify the repository object is the one we expect assert result[0].name == "not classified"
def quote_value(value): if isinstance(value, (Mapping, list)): return SQL(".") elif isinstance(value, Date): return SQL(text(value.unix)) elif isinstance(value, Duration): return SQL(text(value.seconds)) elif is_text(value): return SQL("'" + value.replace("'", "''") + "'") elif value == None: return SQL_NULL elif value is True: return SQL_TRUE elif value is False: return SQL_FALSE else: return SQL(text(value))
def test_make_repository(test_repository, extract_job_settings): # TEST EXISTING FIXTURE MAKES AN OBJECT IN THE DATABASE source = MySQL(extract_job_settings.source.database) with source.transaction(): result = source.query(SQL("SELECT * from repository")) # verify the repository object is the one we expect assert result[0].id == test_repository.id assert result[0].tc_root_url == test_repository.tc_root_url
def test_extract_job_sql(extract_job_settings, transactional_db): """ VERIFY SQL OVER DATABASE If you find this test failing, then replace the contents of test_extract_job.sql with the contents of the `sql` variable below. You can then review the resulting diff. """ with MySqlSnowflakeExtractor(extract_job_settings.source) as extractor: sql = extractor.get_sql(SQL("SELECT 0")) assert "".join(sql.sql.split()) == "".join(EXTRACT_JOB_SQL.split())
def quote_value(self, value): if value == None: return SQL_NULL if is_list(value): json = value2json(value) return self.quote_value(json) if is_text(value) and len(value) > 256: value = value[:256] return SQL(adapt(value))
def confirm(self, serial): with self.queue.broker.db.transaction() as t: t.execute( SQL(f""" DELETE FROM {UNCONFIRMED} WHERE subscriber = {quote_value(self.id)} AND serial = {quote_value(serial)} """)) t.execute( SQL(f""" UPDATE {SUBSCRIBER} SET last_confirmed_serial=COALESCE( ( SELECT min(serial) FROM {UNCONFIRMED} AS u WHERE u.subscriber={quote_value(self.id)} ), next_emit_serial )-1 """))
def quote_sql(value, param=None): """ USED TO EXPAND THE PARAMETERS TO THE SQL() OBJECT """ try: if isinstance(value, SQL): if not param: return value param = {k: quote_sql(v) for k, v in param.items()} return SQL(expand_template(value, param)) elif is_text(value): return SQL(value) elif is_data(value): return quote_value(json_encode(value)) elif hasattr(value, '__iter__'): return quote_list(value) else: return text(value) except Exception as e: Log.error("problem quoting SQL", e)
def single(col, r): min = coalesce(r["gte"], r[">="]) max = coalesce(r["lte"], r["<="]) if min != None and max != None: # SPECIAL CASE (BETWEEN) sql = quote_column(col) + SQL(" BETWEEN ") + quote_value( min) + SQL_AND + quote_value(max) else: sql = SQL_AND.join( quote_column(col) + name2sign[sign] + quote_value(value) for sign, value in r.items()) return sql
def run_compare(self, config, id_sql, expected): db = MySQL(**config.database) extractor = MySqlSnowflakeExtractor(kwargs=config) sql = extractor.get_sql(SQL(id_sql)) result = [] with db.transaction(): cursor = db.query(sql, stream=True, row_tuples=True) cursor = list(cursor) extractor.construct_docs(cursor, result.append, False) self.assertEqual(result, expected, "expecting identical") self.assertEqual(expected, result, "expecting identical")
def to_bq(self, schema, not_null=False, boolean=False): term = BQLang[self.term].partial_eval() if is_literal(term): val = term.value if isinstance(val, text): sql = quote_value(len(val)) elif isinstance(val, (float, int)): sql = quote_value(len(value2json(val))) else: return Null else: value = term.to_bq(schema, not_null=not_null)[0].sql.s sql = ConcatSQL(SQL("LENGTH"), sql_iso(value)) return wrap([{"name": ".", "sql": {"n": sql}}])
def create_view(self, view_api_name, shard_api_name): sql = ConcatSQL( SQL("CREATE VIEW\n"), quote_column(view_api_name), SQL_AS, sql_query({"from": shard_api_name}), ) job = self.query_and_wait(sql) if job.errors: Log.error( "Can not create view\n{{sql}}\n{{errors|json|indent}}", sql=sql, errors=job.errors, ) pass
def quote_column(*path): if DEBUG: if not path: Log.error("expecting a name") for p in path: if not is_text(p): Log.error("expecting strings, not {{type}}", type=p.__class__.__name__) try: output = ConcatSQL(SQL_SPACE, JoinSQL(SQL_DOT, [SQL(quote(p)) for p in path]), SQL_SPACE) return output except Exception as e: Log.error("Not expacted", cause=e)
def test_django_cannot_encode_datetime_strings(extract_job_settings): """ DJANGO/MYSQL DATETIME MATH WORKS WHEN STRINGS """ epoch_string = Date.EPOCH.format() sql_query = SQL( str((Job.objects.filter( Q(last_modified__gt=epoch_string) | (Q(last_modified=epoch_string) & Q(id__gt=0))).annotate().values("id").order_by( "last_modified", "id")[:2000]).query)) source = MySQL(extract_job_settings.source.database) with pytest.raises(Exception): with source.transaction(): list(source.query(sql_query, stream=True, row_tuples=True))
def test_django_cannot_encode_datetime(extract_job_settings): """ DJANGO DOES NOT ENCODE THE DATETIME PROPERLY """ epoch = Date(Date.EPOCH).datetime get_ids = SQL( str((Job.objects.filter( Q(last_modified__gt=epoch) | (Q(last_modified=epoch) & Q(id__gt=0))).annotate().values("id").order_by( "last_modified", "id")[:2000]).query)) source = MySQL(extract_job_settings.source.database) with pytest.raises(Exception): with source.transaction(): list(source.query(get_ids, stream=True, row_tuples=True))
def to_sql(self, schema, not_null=False, boolean=False): prefix = SQLang[self.prefix].partial_eval() if is_literal(prefix): value = SQLang[self.value].partial_eval().to_sql(schema)[0].sql.s prefix = prefix.value if "%" in prefix or "_" in prefix: for r in "\\_%": prefix = prefix.replaceAll(r, "\\" + r) sql = ConcatSQL(value, SQL_LIKE, quote_value(prefix + "%"), SQL_ESCAPE, SQL("\\")) else: sql = ConcatSQL(value, SQL_LIKE, quote_value(prefix + "%")) return wrap([{"name": ".", "sql": {"b": sql}}]) else: return (SqlEqOp([SqlInstrOp([self.value, prefix]), SQL_ONE]).partial_eval().to_sql())
def to_bq(self, schema, not_null=False, boolean=False): default = self.default.to_bq(schema) if len(self.terms) == 0: return default default = coalesce(default[0].sql.s, SQL_NULL) sep = BQLang[self.separator].to_bq(schema)[0].sql.s acc = [] for t in self.terms: t = BQLang[t] missing = t.missing().partial_eval() term = t.to_bq(schema, not_null=True)[0].sql if term.s: term_sql = term.s elif term.n: term_sql = "cast(" + term.n + " as text)" else: term_sql = (SQL_CASE + SQL_WHEN + term.b + SQL_THEN + quote_value("true") + SQL_ELSE + quote_value("false") + SQL_END) if isinstance(missing, TrueOp): acc.append(SQL_EMPTY_STRING) elif missing: acc.append( SQL_CASE + SQL_WHEN + sql_iso(missing.to_bq(schema, boolean=True)[0].sql.b) + SQL_THEN + SQL_EMPTY_STRING + SQL_ELSE + sql_iso(sql_concat_text([sep, term_sql])) + SQL_END) else: acc.append(sql_concat_text([sep, term_sql])) expr_ = "SUBSTR" + sql_iso( sql_list([ sql_concat_text(acc), LengthOp(self.separator).to_bq(schema)[0].sql.n + SQL("+1"), ])) return BQLScript( expr=expr_, data_type=STRING, frum=self, miss=self.missing(), many=False, schema=schema, )
def to_bq(self, schema, not_null=False, boolean=False): prefix = BQLang[self.prefix].partial_eval() if is_literal(prefix): value = BQLang[self.value].partial_eval().to_bq(schema)[0].sql.s prefix = prefix.to_bq(schema)[0].sql.s if "%" in prefix or "_" in prefix: for r in "\\_%": prefix = prefix.replaceAll(r, "\\" + r) sql = ConcatSQL(value, SQL_LIKE, prefix, SQL_ESCAPE, SQL("\\")) else: sql = ConcatSQL(value, SQL_LIKE, prefix) return wrap([{"name": ".", "sql": {"b": sql}}]) else: return (SqlEqOp( [SqlSubstrOp([self.value, ONE, LengthOp(prefix)]), prefix]).partial_eval().to_bq())
def test_extract_job(complex_job, extract_job_settings, now): source = MySQL(extract_job_settings.source.database) extractor = MySqlSnowflakeExtractor(extract_job_settings.source) sql = extractor.get_sql(SQL("SELECT " + text(complex_job.id) + " as id")) acc = [] with source.transaction(): cursor = list(source.query(sql, stream=True, row_tuples=True)) extractor.construct_docs(cursor, acc.append, False) doc = acc[0] doc.guid = complex_job.guid doc.last_modified = complex_job.last_modified assertAlmostEqual( acc, JOB, places=3 ) # TH MIXES LOCAL TIMEZONE WITH GMT: https://bugzilla.mozilla.org/show_bug.cgi?id=1612603