sktutor package¶
Submodules¶
sktutor.preprocessing module¶
-
class
sktutor.preprocessing.
BitwiseOperator
(operator, mapper)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Apply a bitwise operator
&
or|
to a list of columns.Parameters: - mapper (dict) – A mapping from new columns which will be defined by applying the bitwise operator to a list of old columns
- operator (str) – the name of the bitwise operator to apply. ‘and’, ‘or’ are acceptable inputs
mapper
takes the form:{'new_column1': ['old_column1', 'old_column2', 'old_column3'], 'new_column2': ['old_column2', 'old_column4', 'old_column5'] }
-
class
sktutor.preprocessing.
BoxCoxTransformer
(adder=0)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Create BoxCox Transformations on all columns.
Parameters: adder (numeric) – the amount to add to each column before the BoxCox transformation -
fit
(X, y=None, **fit_params)[source]¶ Fit the transformer on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
-
class
sktutor.preprocessing.
ColumnDropper
(col)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Drop a list of columns from a
DataFrame
.Parameters: col (list of strings) – A list of columns to extract from the DataFrame
-
class
sktutor.preprocessing.
ColumnExtractor
(col)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Extract a list of columns from a
DataFrame
.Parameters: col (list of strings) – A list of columns to extract from the DataFrame
-
class
sktutor.preprocessing.
ColumnNameCleaner
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Replaces spaces and formula symbols in column names that conflict with patsy formula interpretation
-
class
sktutor.preprocessing.
ColumnValidator
[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Ensure that the transformed dataset has the same columns and order as the original fit dataset. Could be useful to check at the beginning and end of pipelines.
-
class
sktutor.preprocessing.
ContinuousFeatureBinner
(field, bins, right_inclusive=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Creates bins for continuous features
Parameters: - field (string) – the continuous field for which to create bins
- bins (array-like) – The criteria to bin by.
- right_inclusive (bool) – interval should be right-inclusive or not
-
class
sktutor.preprocessing.
DummyCreator
(**kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Create dummy variables from categorical variables.
Parameters: - dummy_na (boolean) – Add a column to indicate NaNs, if False NaNs are ignored.
- drop_first (boolean) – Whether to get k-1 dummies out of k categorical levels by removing the first level.
-
fit
(X, y=None, **fit_params)[source]¶ Fit the dummy creator on X. Retains a record of columns produced with the fitting data.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
class
sktutor.preprocessing.
FactorLimiter
(factors_per_column=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
For each named column, it limits factors to a list of acceptable values. Non-comforming factors, including missing values, are replaced by a default value.
Parameters: factors_per_column (dictionary) – dictionary mapping column name keys to a dictionary with a list of acceptable factor values and a default factor value for non-conforming values factors_per_column
takes the form:{'column_name': {'factors': ['value1', 'value2', 'value3'], 'default': 'value1'}, } }
-
class
sktutor.preprocessing.
GenericTransformer
(function, params=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Generic transformer that applies user-defined function within pipeline framework. Arbitrary callable should only make transformations and does not store any fit() parameters. Lambda functions are not supported as they cannot be pickled.
Parameters: - function (callable) – arbitrary function to use as a transformer
- params (dict) – dict with function parameter name as key and parameter value as value
-
class
sktutor.preprocessing.
GroupByImputer
(impute_type, group=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Imputes Missing Values by Group with specified function. If a
group
parameter is given, it can be the name of any function which can be passed to theagg
function of a pandasGroupBy
object. If agroup
paramter is not given, then only ‘mean’, ‘median’, and ‘most_frequent’ can be used.Parameters: - impute_type (string) – The type of imputation to be performed.
- group (string or list of strings) – The column name or a list of column names to group the
pandas DataFrame
.
-
class
sktutor.preprocessing.
InteractionCreator
(columns1, columns2)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Creates interactions across columns of a
DataFrame
Parameters: - columns1 (list of strings) – first list of columns to create interactions with each of the second list of columns
- columns2 (list of strings) – second list of columns to create interactions with each of the second list of columns
-
class
sktutor.preprocessing.
MissingColumnsReplacer
(cols, value)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Fill in missing columns to a DataFrame :param cols: The expected list of columns. :param value: The value to fill the new columns with by default
-
class
sktutor.preprocessing.
MissingValueFiller
(value)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Fill missing values with a specified value. Should only be used with columns of similar dtypes.
Parameters: value – The value to impute for missing factors.
-
class
sktutor.preprocessing.
OverMissingThresholdDropper
(threshold)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Drop columns with more missing data than a given threshold.
Parameters: threshold (float) – Maximum portion of missing data that is acceptable. Must be within the interval [0,1]
-
class
sktutor.preprocessing.
PolynomialFeatures
(degree=2, interaction_only=False)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Creates polynomail features from inputs.
Parameters: degree – The degree of the polynomial Interaction_only: if true, only interaction features are produced: features that are products of at most degree distinct input features.
-
class
sktutor.preprocessing.
SingleValueAboveThresholdDropper
(threshold=1, dropna=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Removes columns with a single value representing a higher percentage of values than a given threshold
Parameters: - threshold (float) – percentage of single value in a column to be removed
- dropna (boolean) – If True, do not consider NaN as a value
-
class
sktutor.preprocessing.
SingleValueDropper
(dropna=True)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Drop columns with only one unique value
Parameters: dropna (boolean) – If True, do not consider NaN as a value
-
class
sktutor.preprocessing.
SklearnPandasWrapper
(transformer)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Wrap a scikit-learn Transformer with a pandas-friendly version that keeps columns and row indices in place. Will only work for Transformers that do not add or change the order of columns. :param transformer: The scikit-learn compatible Transformer object. :type transformer: sklearn Transformer
-
class
sktutor.preprocessing.
StandardScaler
(columns=None, **kwargs)[source]¶ Bases:
sklearn.preprocessing._data.StandardScaler
Standardize features by removing mean and scaling to unit variance
-
fit
(X, y=None, **fit_params)[source]¶ Fit the transformer on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
fit_transform
(X, y=None, **fit_params)[source]¶ Fit and transform the StandardScaler on X.
Parameters: X (pandas DataFrame) – The input data. Return type: Returns self.
-
-
class
sktutor.preprocessing.
TextContainsDummyExtractor
(mapper)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Extract one or more dummy variables based on whether one or more text columns contains one or more strings.
Parameters: mapper (dict) – a mapping of new columns to criteria to populate it as True mapper
takes the form:{'old_column1': {'new_column1': [{'pattern': 'string1', 'kwargs': {'case': False}}, {'pattern': 'string2', 'kwargs': {'case': False}} ], 'new_column2': [{'pattern': 'string3', 'kwargs': {'case': False}}, {'pattern': 'string4', 'kwargs': {'case': False}} ], }, 'old_column2': {'new_column3': [{'pattern': 'string5', 'kwargs': {'case': False}}, {'pattern': 'string6', 'kwargs': {'case': False}} ], 'new_column4': [{'pattern': 'string7', 'kwargs': {'case': False}}, {'pattern': 'string8', 'kwargs': {'case': False}} ] } }
-
class
sktutor.preprocessing.
TypeExtractor
(type)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Returns dataframe with only specified field type
Parameters: type (string) – desired type; either ‘numeric’ or ‘categorical’
-
class
sktutor.preprocessing.
ValueReplacer
(mapper=None, inverse_mapper=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Replaces Values in each column according to a nested dictionary.
inverse_mapper
is probably more intuitive for when one value replaces many values. Only one ofinverse_mapper
ormapper
can be used.Parameters: - mapper (dictionary) – Nested dictionary with columns mapping to dictionaries that map old values to new values.
- inverse_mapper (dictionary) – Nested dictionary with columns mapping to dictionaries that map new values to a list of old values
mapper
takes the form:{'column_name': {'old_value1': 'new_value1', 'old_value2': 'new_value1', 'old_value3': 'new_value2'} }
while
inverse_mapper
takes the form:{'column_name': {'new_value1': ['old_value1', 'old_value2'], 'new_value2': ['old_value1']} }
sktutor.pipline module¶
-
class
sktutor.pipeline.
FeatureUnion
(transformer_list, *, n_jobs=None, transformer_weights=None, verbose=False)[source]¶ Bases:
sklearn.pipeline.FeatureUnion
Perform a list of transformations in parallel and concat the results
Parameters: - transformers – list of (string, transformer) tuples
- n_jobs – Number of jobs to run in parallel (default 1).
-
sktutor.pipeline.
make_union
(*transformers, **kwargs)[source]¶ Construct a FeatureUnion from the given transformers. This is a shorthand for the FeatureUnion constructor; it does not require, and does not permit, naming the transformers. Instead, they will be given names automatically based on their types. It also does not allow weighting.
Parameters: - transformers – list of estimators
- n_jobs – Number of jobs to run in parallel (default 1).
Return type: