Python SqliteHook.get_pandas_dfの例

プログラミング言語: Python

名前空間/パッケージ名: airflow.hooks.sqlite_hook

クラス/型: SqliteHook

メソッド/関数: get_pandas_df

hotexamples.comのコード掲載数: 4

Python SqliteHook.get_pandas_df - 4件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのairflow.hooks.sqlite_hook.SqliteHook.get_pandas_dfの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

SqliteHook(26)

get_conn(7)

get_connection(6)

run(6)

get_pandas_df(4)

commit(3)

execute(3)

close(2)

get_connections(1)

get_records(1)

コード例 #1

ファイルを表示

ファイル: weather.py プロジェクト: tarasen-mv/apache-airflow

def process_weather(station, field_to_process):
    db_hook = SqliteHook(conn_name_attr='sqlite_default')

    weather_select = (f'select record_date, {field_to_process} '
                      f'from weather where station_id={station} '
                      f'order by record_date;')

    data = db_hook.get_pandas_df(weather_select)
    average = data.rolling(
        3, center=True).mean().rename(columns={field_to_process: 'average'})
    data = data.merge(average, left_index=True, right_index=True)

    del average

    weather_update = """
    update weather
    set average = ?
    where station_id=? and record_date=?;
    """

    # iteration over data is used like a hack to avoid using either DataFrame.itertuples(), iteritems() or iterrows(),
    # which may be a bottleneck in case if number of rows is more then several thousands

    data.apply(lambda row: db_hook.run(weather_update,
                                       parameters=(row['average'], station,
                                                   row['record_date'])),
               axis=1)

コード例 #2

ファイルを表示

def getdf():
    import os.path
    h = SqliteHook(conn_id)
    df = h.get_pandas_df(
        "SELECT A.DocumentNo, A.FullName, A.Device, A.Country, B.OrderId, B.DocumentNo, B.OrderDate, B.CatalogId,C.CatalogId, C.ProductId, C.CUID, D.ProductId, D.ProductName, D.CUID FROM CUSTOMERS as A inner join ORDERS as B on A.DocumentNo = B.DocumentNo inner join CATALOG as C on C.CatalogId = B.CatalogId inner join PRODUCTS AS D ON C.CUID = D.CUID"
    )
    print(df)
    df.to_csv(os.path.join(BASE_DIR, "SourceData.txt"), index=False)

コード例 #3

ファイルを表示

def predict(classifier, **context):
    """
    Makes predictions for a model created by the given classifier and returns its
    stores the results in the mct_talks table.
    """
    # Load model
    model = _load_model(classifier.__name__)

    # Load data
    db = SqliteHook()
    df = db.get_pandas_df('select * from mct_talks')

    # Make predictions
    df['Conference'] = model.predict(df['Title'].tolist())

    # Save predictions
    with db.get_conn() as conn:
        df.to_sql('mct_talks', con=conn, if_exists='replace')

コード例 #4

ファイルを表示

def split_data(**context):
    """
    Splits the sample data (i.e. research_papers) into training and test sets
    and stores them in the Sqlite DB.
    """
    # Load full dataset
    db = SqliteHook()
    df = db.get_pandas_df('select * from research_papers')

    # Create train/test split
    train, _ = train_test_split(df.index,
                                test_size=0.33,
                                stratify=df['Conference'],
                                random_state=42)

    # Save training and test data in separate tables
    with db.get_conn() as conn:
        df.iloc[train].to_sql('training_data', con=conn, if_exists='replace')
        df.drop(train).to_sql('test_data', con=conn, if_exists='replace')