Dataset Scrubbing Utilities¶
Perform dataset scrubbing actions and return the scrubbed dataset as a ready-to-go data feed. This is an approach for normalizing an internal data feed.
Supported environment variables:
# verbose logging in this module
# note this can take longer to transform
# DataFrames and is not recommended for
# production:
export DEBUG_FETCH=1
Ingress Scrubbing supports converting an incoming
dataset (from IEX) and converts it to one
of the following data feed and returned as a
pandas DataFrame
:
DATAFEED_DAILY = 900
DATAFEED_MINUTE = 901
DATAFEED_QUOTE = 902
DATAFEED_STATS = 903
DATAFEED_PEERS = 904
DATAFEED_NEWS = 905
DATAFEED_FINANCIALS = 906
DATAFEED_EARNINGS = 907
DATAFEED_DIVIDENDS = 908
DATAFEED_COMPANY = 909
DATAFEED_PRICING_YAHOO = 1100
DATAFEED_OPTIONS_YAHOO = 1101
DATAFEED_NEWS_YAHOO = 1102
-
analysis_engine.dataset_scrub_utils.
debug_msg
(label, datafeed_type, msg_format, date_str, df)[source]¶ Debug helper for debugging scrubbing handlers
Parameters: - label – log label
- datafeed_type – fetch type
- msg_format – message to include
- date_str – date string
- df –
pandas DataFrame
orNone
-
analysis_engine.dataset_scrub_utils.
ingress_scrub_dataset
(label, datafeed_type, df, date_str=None, msg_format=None, scrub_mode='sort-by-date', ds_id='no-id')[source]¶ Scrub a
pandas.DataFrame
from an Ingress pricing service and return the resultingpandas.DataFrame
Parameters: - label – log label
- datafeed_type –
analysis_engine.iex.consts.DATAFEED_*
type oranalysis_engine.yahoo.consts.DATAFEED_*`
type .. code-block:: pythonDATAFEED_DAILY = 900 DATAFEED_MINUTE = 901 DATAFEED_QUOTE = 902 DATAFEED_STATS = 903 DATAFEED_PEERS = 904 DATAFEED_NEWS = 905 DATAFEED_FINANCIALS = 906 DATAFEED_EARNINGS = 907 DATAFEED_DIVIDENDS = 908 DATAFEED_COMPANY = 909 DATAFEED_PRICING_YAHOO = 1100 DATAFEED_OPTIONS_YAHOO = 1101 DATAFEED_NEWS_YAHOO = 1102 - df –
pandas DataFrame
- date_str – date string for simulating historical dates
or
datetime.datetime.now()
if not set - msg_format – msg format for a
string.format()
- scrub_mode – mode to scrub this dataset
- ds_id – dataset identifier
-
analysis_engine.dataset_scrub_utils.
extract_scrub_dataset
(label, datafeed_type, df, date_str=None, msg_format=None, scrub_mode='sort-by-date', ds_id='no-id')[source]¶ Scrub a cached
pandas.DataFrame
that was stored in Redis and return the resultingpandas.DataFrame
Parameters: - label – log label
- datafeed_type –
analysis_engine.iex.consts.DATAFEED_*
type oranalysis_engine.yahoo.consts.DATAFEED_*`
type .. code-block:: pythonDATAFEED_DAILY = 900 DATAFEED_MINUTE = 901 DATAFEED_QUOTE = 902 DATAFEED_STATS = 903 DATAFEED_PEERS = 904 DATAFEED_NEWS = 905 DATAFEED_FINANCIALS = 906 DATAFEED_EARNINGS = 907 DATAFEED_DIVIDENDS = 908 DATAFEED_COMPANY = 909 DATAFEED_PRICING_YAHOO = 1100 DATAFEED_OPTIONS_YAHOO = 1101 DATAFEED_NEWS_YAHOO = 1102 - df –
pandas DataFrame
- date_str – date string for simulating historical dates
or
datetime.datetime.now()
if not set - msg_format – msg format for a
string.format()
- scrub_mode – mode to scrub this dataset
- ds_id – dataset identifier
-
analysis_engine.dataset_scrub_utils.
build_dates_from_df_col
(df, use_date_str, src_col='minute', src_date_format='%Y-%m-%d %H:%M:%S', output_date_format='%Y-%m-%d %H:%M:%S')[source]¶ Converts a string date column series in a
pandas.DataFrame
to a well-formed date string list.Parameters: - src_col – source column name
- use_date_str – date string for today
- src_date_format – format of the string in the
`df[src_col]
columne - output_date_format – write the new date strings in this format.
- df – source
pandas.DataFrame