sktutor package¶
Submodules¶
sktutor.preprocessing module¶
-
class
sktutor.preprocessing.BitwiseOperator(operator, mapper)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinApply a bitwise operator
&or|to a list of columns.Parameters: - mapper (dict) – A mapping from new columns which will be defined by applying the bitwise operator to a list of old columns
- operator (str) – the name of the bitwise operator to apply. ‘and’, ‘or’ are acceptable inputs
mappertakes the form:{'new_column1': ['old_column1', 'old_column2', 'old_column3'], 'new_column2': ['old_column2', 'old_column4', 'old_column5'] }
-
class
sktutor.preprocessing.BoxCoxTransformer(adder=0)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreate BoxCox Transformations on all columns.
Parameters: adder (numeric) – the amount to add to each column before the BoxCox transformation -
fit(X, y=None, **fit_params)[source]¶ Fit the transformer on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
-
class
sktutor.preprocessing.ColumnDropper(col)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinDrop a list of columns from a
DataFrame.Parameters: col (list of strings) – A list of columns to extract from the DataFrame
-
class
sktutor.preprocessing.ColumnExtractor(col)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinExtract a list of columns from a
DataFrame.Parameters: col (list of strings) – A list of columns to extract from the DataFrame
-
class
sktutor.preprocessing.ColumnNameCleaner[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinReplaces spaces and formula symbols in column names that conflict with patsy formula interpretation
-
class
sktutor.preprocessing.ColumnValidator[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinEnsure that the transformed dataset has the same columns and order as the original fit dataset. Could be useful to check at the beginning and end of pipelines.
-
class
sktutor.preprocessing.ContinuousFeatureBinner(field, bins, right_inclusive=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreates bins for continuous features
Parameters: - field (string) – the continuous field for which to create bins
- bins (array-like) – The criteria to bin by.
- right_inclusive (bool) – interval should be right-inclusive or not
-
class
sktutor.preprocessing.DummyCreator(**kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreate dummy variables from categorical variables.
Parameters: - dummy_na (boolean) – Add a column to indicate NaNs, if False NaNs are ignored.
- drop_first (boolean) – Whether to get k-1 dummies out of k categorical levels by removing the first level.
-
fit(X, y=None, **fit_params)[source]¶ Fit the dummy creator on X. Retains a record of columns produced with the fitting data.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
class
sktutor.preprocessing.FactorLimiter(factors_per_column=None)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinFor each named column, it limits factors to a list of acceptable values. Non-comforming factors, including missing values, are replaced by a default value.
Parameters: factors_per_column (dictionary) – dictionary mapping column name keys to a dictionary with a list of acceptable factor values and a default factor value for non-conforming values factors_per_columntakes the form:{'column_name': {'factors': ['value1', 'value2', 'value3'], 'default': 'value1'}, } }
-
class
sktutor.preprocessing.GenericTransformer(function, params=None)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinGeneric transformer that applies user-defined function within pipeline framework. Arbitrary callable should only make transformations and does not store any fit() parameters. Lambda functions are not supported as they cannot be pickled.
Parameters: - function (callable) – arbitrary function to use as a transformer
- params (dict) – dict with function parameter name as key and parameter value as value
-
class
sktutor.preprocessing.GroupByImputer(impute_type, group=None)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinImputes Missing Values by Group with specified function. If a
groupparameter is given, it can be the name of any function which can be passed to theaggfunction of a pandasGroupByobject. If agroupparamter is not given, then only ‘mean’, ‘median’, and ‘most_frequent’ can be used.Parameters: - impute_type (string) – The type of imputation to be performed.
- group (string or list of strings) – The column name or a list of column names to group the
pandas DataFrame.
-
class
sktutor.preprocessing.InteractionCreator(columns1, columns2)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreates interactions across columns of a
DataFrameParameters: - columns1 (list of strings) – first list of columns to create interactions with each of the second list of columns
- columns2 (list of strings) – second list of columns to create interactions with each of the second list of columns
-
class
sktutor.preprocessing.MissingColumnsReplacer(cols, value)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinFill in missing columns to a DataFrame :param cols: The expected list of columns. :param value: The value to fill the new columns with by default
-
class
sktutor.preprocessing.MissingValueFiller(value)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinFill missing values with a specified value. Should only be used with columns of similar dtypes.
Parameters: value – The value to impute for missing factors.
-
class
sktutor.preprocessing.OverMissingThresholdDropper(threshold)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinDrop columns with more missing data than a given threshold.
Parameters: threshold (float) – Maximum portion of missing data that is acceptable. Must be within the interval [0,1]
-
class
sktutor.preprocessing.PolynomialFeatures(degree=2, interaction_only=False)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinCreates polynomail features from inputs.
Parameters: degree – The degree of the polynomial Interaction_only: if true, only interaction features are produced: features that are products of at most degree distinct input features.
-
class
sktutor.preprocessing.SingleValueAboveThresholdDropper(threshold=1, dropna=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinRemoves columns with a single value representing a higher percentage of values than a given threshold
Parameters: - threshold (float) – percentage of single value in a column to be removed
- dropna (boolean) – If True, do not consider NaN as a value
-
class
sktutor.preprocessing.SingleValueDropper(dropna=True)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinDrop columns with only one unique value
Parameters: dropna (boolean) – If True, do not consider NaN as a value
-
class
sktutor.preprocessing.SklearnPandasWrapper(transformer)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinWrap a scikit-learn Transformer with a pandas-friendly version that keeps columns and row indices in place. Will only work for Transformers that do not add or change the order of columns. :param transformer: The scikit-learn compatible Transformer object. :type transformer: sklearn Transformer
-
class
sktutor.preprocessing.StandardScaler(columns=None, **kwargs)[source]¶ Bases:
sklearn.preprocessing._data.StandardScalerStandardize features by removing mean and scaling to unit variance
-
fit(X, y=None, **fit_params)[source]¶ Fit the transformer on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
fit_transform(X, y=None, **fit_params)[source]¶ Fit and transform the StandardScaler on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
-
class
sktutor.preprocessing.TextContainsDummyExtractor(mapper)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinExtract one or more dummy variables based on whether one or more text columns contains one or more strings.
Parameters: mapper (dict) – a mapping of new columns to criteria to populate it as True mappertakes the form:{'old_column1': {'new_column1': [{'pattern': 'string1', 'kwargs': {'case': False}}, {'pattern': 'string2', 'kwargs': {'case': False}} ], 'new_column2': [{'pattern': 'string3', 'kwargs': {'case': False}}, {'pattern': 'string4', 'kwargs': {'case': False}} ], }, 'old_column2': {'new_column3': [{'pattern': 'string5', 'kwargs': {'case': False}}, {'pattern': 'string6', 'kwargs': {'case': False}} ], 'new_column4': [{'pattern': 'string7', 'kwargs': {'case': False}}, {'pattern': 'string8', 'kwargs': {'case': False}} ] } }
-
class
sktutor.preprocessing.TypeExtractor(type)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinReturns dataframe with only specified field type
Parameters: type (string) – desired type; either ‘numeric’ or ‘categorical’
-
class
sktutor.preprocessing.ValueReplacer(mapper=None, inverse_mapper=None)[source]¶ Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinReplaces Values in each column according to a nested dictionary.
inverse_mapperis probably more intuitive for when one value replaces many values. Only one ofinverse_mapperormappercan be used.Parameters: - mapper (dictionary) – Nested dictionary with columns mapping to dictionaries that map old values to new values.
- inverse_mapper (dictionary) – Nested dictionary with columns mapping to dictionaries that map new values to a list of old values
mappertakes the form:{'column_name': {'old_value1': 'new_value1', 'old_value2': 'new_value1', 'old_value3': 'new_value2'} }
while
inverse_mappertakes the form:{'column_name': {'new_value1': ['old_value1', 'old_value2'], 'new_value2': ['old_value1']} }
sktutor.pipline module¶
-
class
sktutor.pipeline.FeatureUnion(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]¶ Bases:
sklearn.pipeline.FeatureUnionPerform a list of transformations in parallel and concat the results
Parameters: - transformers – list of (string, transformer) tuples
- n_jobs – Number of jobs to run in parallel (default 1).
-
sktutor.pipeline.make_union(*transformers, **kwargs)[source]¶ Construct a FeatureUnion from the given transformers. This is a shorthand for the FeatureUnion constructor; it does not require, and does not permit, naming the transformers. Instead, they will be given names automatically based on their types. It also does not allow weighting.
Parameters: - transformers – list of estimators
- n_jobs – Number of jobs to run in parallel (default 1).
Return type: