Pandas
keyboard_arrow_down 655 guides
chevron_leftPandas
Common questions10 topics
Documentation5 topics
Cookbooks2 topics
Getting startedAPI referenceRecipes referencecheck_circle
Mark as learned thumb_up
4
thumb_down
1
chat_bubble_outline
0
Comment auto_stories Bi-column layout
settings
Pandas | Recipes reference
schedule Aug 12, 2023
Last updated local_offer
Tags Python●Pandas
tocTable of Contents
expand_more Master the mathematics behind data science with 100+ top-tier guides
Start your free 7-days trial now!
Start your free 7-days trial now!
DataFrame Cookbooks
Creating DataFrames Cookbook
- Combining multiple Series into a DataFrame in PandasTo combine multiple Series into a single DataFrame in Pandas, use the concat(~) method or use the DataFrame's constructor.
- Combining multiple Series to form a DataFrame in PandasTo combine multiple Series to form a DataFrame in Pandas, use the concat(~) method.
- Converting a Series to a DataFrame in PandasTo convert a Series to a DataFrame in Pandas, use the the Series' to_frame(~) method.
- Converting list of lists into Pandas DataFrameTo convert list of lists into Pandas DataFrame, call pass in the list into the DataFrame(~) constructor.
- Converting list to Pandas DataFrameTo convert a list into a Pandas DataFrame, pass the list directly into the DataFrame constructor.
- Converting percent string into a numeric for read_csv in PandasTo convert percent string into numeric when using read_csv(~) in Pandas, supply the appropriate converters parameter.
- Converting scikit-learn dataset to Pandas DataFrameTo convert a scikit-learn dataset to Pandas DataFrame, use the DataFrame constructor.
- Converting string data into a DataFrame in PandasTo read data given in string form as a Pandas DataFrame, use read_csv(~) along with StringIO.
- Creating a DataFrame from a string in PandasTo create a DataFrame from a string in Pandas, use StringIO(~) to convert the string into a file-like object, and then use read_csv(~) to read this object and parse it into a DataFrame.
- Creating a DataFrame using lists in PandasTo create a DataFrame using lists in Pandas, directly pass the lists into the DataFrame constructor.
- Creating a DataFrame with different type for each column in PandasTo create a DataFrame with different type for each column in Pandas, specify the dtype parameter of the DataFrame constructor.
- Creating a DataFrame with empty values in PandasTo create a DataFrame with empty values (NaN) in Pandas, call the DataFrame constructor with parameters columns and index.
- Creating a Pandas DataFrame with missing valuesTo create a Pandas DataFrame with missing values, pass in np.nan to the DataFrame constructor and supply the index and columns parameter.
- Creating a DataFrame with random numbers in PandasTo initialise a DataFrame containing random numbers in Pandas, use one of NumPy's functions to generate an array of random numbers, and then pass that into the DataFrame constructor.
- Creating a DataFrame with zeros in PandasTo create a DataFrame with zeros in Pandas, pass in the value 0 to the DataFrame constructor and supply the parameters index and columns.
- Creating a MultiIndex DataFrame in PandasTo create a MultiIndex DataFrame in Pandas, first create a MultiIndex object using methods from_tuples, from_arrays or from_product, and then pass this object directly to the DataFrame constructor.
- Creating a Pandas DataFrameTo create a Pandas DataFrame, use the DataFrame(~) constructor.
- Creating a single DataFrame from multiple files in PandasTo create a single DataFrame from multiple files in Pandas, read the files individually using read_csv(~), and then concatenate them using concat(~).
- Creating empty DataFrame with only column labels in PandasTo create an empty DataFrame with only column labels in Pandas, supply only the columns parameter in the DataFrame constructor.
- Filling missing values when using read_csv in PandasTo fill missing values when reading a file in Pandas, call the fillna(~) method after calling read_csv(~).
- Importing Dataset in PandasTo import and read a CSV file as a DataFrame, use Pandas read_csv(~) method.
- Comprehensive guide on importing tables from PostgreSQL as Pandas DataFramesTo read a PostgreSQL table as a Pandas DataFrame, first establish a connection to the server using sqlalchemy, and then use Pandas' read_sql(~) method to create a DataFrame.
- Initialising a DataFrame using a constant in PandasTo initialise a DataFrame using a single constant in Pandas, pass the constant into a DataFrame constructor and specify the parameters columns and index.
- Initialising a DataFrame using a dictionary in PandasTo initialise a DataFrame using a dictionary in Pandas, pass it directly into the DataFrame constructor.
- Initialising a Pandas DataFrame using a list of dictionariesTo initialise a Pandas DataFrame using a list of dictionaries, simply pass it to the DataFrame constructor.
- Inserting lists into a Pandas DataFrame cellTo insert a list into a Pandas DataFrame cell, ensure the type of the column is object, and then use at or iat property to insert the list.
- Keeping leading zeroes when using read_csv in PandasTo keep leading zeros for read_csv(~) in Pandas, specify the type as string for that column using the dtype parameter.
- Parsing dates when using read_csv in PandasTo parse datas when using read_csv(~) in Pandas, supply the parse_dates parameter.
- Preventing strings from getting parsed as NaN for read_csv in PandasTo prevent strings from getting parsed as NaN when using read_csv(~) in Pandas, set the keep_default_na=False parameter.
- Reading URL using read_csv in PandasTo read a dataset that resides in some URL in Pandas, directly pass the URL into read_csv(~).
- Reading data from GitHub in PandasTo read data from a GitHub page in Pandas, supply the URL to the read_csv(~) method.
- Reading file without header in PandasTo read a file without header into a Pandas DataFrame when using read_csv(~), set header=None.
- Reading large CSV files in chunks in PandasTo read large CSV files in chunks in Pandas, use the read_csv(~) method and specify the chunksize parameter. This is particularly useful if you are facing a MemoryError when trying to read in the whole DataFrame at once.
- Reading n random lines using read_csv in PandasTo read n random lines using read_csv(~) in Pandas, first get the number of lines of the file, and then use the Random.sample(~) method to get the row numbers to skip.
- Reading space-delimited files in PandasTo read space-separated files in Pandas, use read_csv(~) with either parameters sep=" " or delim_whitespace=True.
- Reading specific columns from file in PandasTo read specific columns from a file in Pandas, we can use the read_csv(~) method and specify the usecols parameter.
- Reading tab-delimited files in PandasTo read a tab-delimited file using read_csv(~) in Pandas, specify the parameter sep='\t'.
- Reading the first few lines of a file to create DataFrame in PandasTo read the first n lines of a file to create a Pandas DataFrame when using read_csv(~), set the parameter nrows.
- Reading the last n lines of a file in PandasTo read only the last n lines of a file into a Pandas DataFrame for read_csv(~), first fetch the total number of lines in the file, and then use the skiprows parameter.
- Reading zipped csv file as a Pandas DataFrameTo read a zipped csv file as a Pandas Frame, use Pandas' read_csv(~) method.
- Removing Unnamed:0 column in PandasWe can get an unwanted column named Unnamed:0 when creating a DataFrame from a csv file using the read_csv(~) method. To create a DataFrame without Unnamed:0, we can pass index_col=0 to our read_csv(~) call.
- Resolving ParserError: Error tokenizing data in PandasCommon reasons for ParserError: Error tokenizing data when initiating a Pandas DataFrame include: - Using the wrong delimiter - Number of fields in certain rows do not match with header To resolve the error we can try the following: - Specifying the delimiter through sep parameter in read_csv(~) - Fixing the original source file - Skipping bad rows
- Saving Pandas DataFrame as zipped csvTo save a Pandas DataFrame as a zipped csv, use the to_csv(~) method with compression="gzip".
- Skipping rows without skipping header for read_csv in PandasTo skip rows whilst keeping the column labels when using read_csv(~) in Pandas DataFrame, set the skiprows parameter.
- Specifying data type for read_csv in PandasTo specify a data type for the columns when using read_csv(~) in Pandas, pass a dictionary into the dtype parameter, where the key is the column name and the value is the desired data type for that column.
- Treating missing values as empty strings rather than NaN for read_csv in PandasTo parse as an empty string instead of a NaN when using read_csv(~) in Pandas, set the parameter keep_default_na=False.
- Combining multiple Series into a DataFrame in Pandas
Data Aggregation Cookbook
- Applying a function to multiple columns in groups in PandasTo group by group, and then apply a function on multiple columns in each group in Pandas, use the apply(~) method after groupby(~).
- Calculating percentiles of a DataFrame in PandasTo calculate percentiles in Pandas, use the quantile(~) method.
- Calculating the percentage of each value in each group in PandasTo compute the percentage of each value in each distinct group in Pandas, call the groupby(~) method and then pass in the following function lambda my_df: my_df / my_df.sum(). .
- Computing descriptive statistics of each group in PandasTo get the descriptive statistic of each group in Pandas, call the describe() method after groupby(~).
- Difference between a group's count and size in PandasThe difference between a group's count() and size() in Pandas groupby(~) is that count() returns the number of non-nan values for each column, and size() returns the length, that is, the number of rows of a group.
- Difference between methods apply and transform for groupby in PandasThe differences between methods apply and transform for groupby in Pandas are that, the function passed in apply takes in as argument a DataFrame representing each group while that in transform takes in a Series representing a column of each group.
- Getting cumulative sum of each group in Pandas DataFrameTo compute the cumulative sum of each group in Pandas DataFrame, call df.groupby('name')['score'].cumsum().
- Getting descriptive statistics of DataFrame in PandasTo get the descriptive statistics of DataFrame in Pandas, use the DataFrame's describe(~) method.
- Getting multiple aggregates of a column after grouping in PandasTo get multiple aggregates of a column after grouping by groupby(~) in Pandas, call agg(~) method using either a list of aggregate functions or keyword arguments, where the keyword becomes the resulting column label.
- Getting n rows with smallest column value in each group in PandasTo get the n rows with smallest column value in each group in Pandas, first sort the DataFrame in ascending order by sort-values(~), and then perform a groupby(~) followed by head(n).
- Getting number of distinct rows in each group in Pandas DataFrameTo get the number of distinct scores in each group in Pandas DataFrame, use 'nunique' with groupby(~) and agg(~).
- Getting size of each group in PandasTo get the size of each group in Pandas, use the groups' size() method.
- Getting specific group after groupby in PandasTo get a specific group after calling groupby(~) in Pandas, use the get_group(~) method, which returns a DataFrame.
- Getting the first row of each group in PandasTo get the first row of each group in Pandas, call first() after calling groupby(~).
- Getting the last row of each group in PandasTo get the last row of each group in Pandas, call last() after calling groupby(~).
- Getting the top n rows with largest column value in each group in PandasTo get the top n rows with the largest column value in each group in Pandas, use the DataFrame's sort_values method, then group by the column, and finally use the head function to fetch the top n rows.
- Getting unique values of each group in PandasTo get the number of unique values in each group in Pandas, use nunique(~) after calling groupby(~).
- Grouping by multiple columns in PandasTo group by multiple columns in Pandas, pass in an array of column labels to groupby(~).
- Grouping without turning group column into index in PandasTo perform grouping without turning the group column into an Index in Pandas, set as_index=false for groupby(~) method.
- Merging rows within a group together in PandasTo merge rows within a group together in Pandas we can use the agg(~) method together with the join(~) method to concatenate the row values.
- Naming columns after aggregation in Pandas DataFrameTo name columns after aggregation in Pandas DataFrame, use named arguments in the agg(~) method.
- Sorting values within groups in PandasTo sort values within groups in Pandas, first sort the DataFrame by sort_values(~), and then use the groupby(~) method.
- Applying a function to multiple columns in groups in Pandas
Data Manipulation Cookbook
- Adding a prefix to column values in PandasTo add a prefix to column values in Pandas DataFrame, directly use the + operator to concatenate a string to the column values (broadcasting), or use the Series' str.pad(~) method.
- Adding leading zeros to strings of a column in PandasTo add leading zeros to strings of a column in Pandas DataFrame, use the Series' str.zfill(~) method.
- Adding new column using lists in Pandas DataFrameTo add a single column to a DataFrame, use the square-brackets syntax directly. To add multiple columns, use Pandas concat(~) method.
- Adding padding to a column of strings in PandasTo add padding to a column of strings in Pandas DataFrame, use the Series' str.pad(~) method.
- Bit-wise OR in NumPy and PandasThe pipe character (|) is used to perform a bit-wise OR operation in NumPy and Pandas.
- Changing column type to string in Pandas DataFrameTo change a column type to string in Pandas DataFrame, use astype("string").
- Conditionally updating values of a DataFrame in PandasTo conditionally update values in a Pandas DataFrame, create a boolean mask and then pass it into loc, and finally perform assignment.
- Converting K and M to numerical form in Pandas DataFrameTo convert "K" (thousand) and "M" (million) to numerical form in Pandas DataFrame, use df["A"].replace({"K":"*1e3", "M":"*1e6"}, regex=True).map(pd.eval).astype(int).
- Converting all object-typed columns to categorical type in Pandas DataFrameTo convert all object-typed columns to categorical type in Pandas DataFrame, first obtain a list of column labels where the column is of type object, and then iterate over this list to perform the conversion to categorical.
- Converting column type to date in Pandas DataFrameTo convert the column type from object or string to datetime in Pandas DataFrame, use the pd.to_datetime(~) method.
- Converting column type to float in Pandas DataFrameTo convert the column type to float in Pandas DataFrame, either use the Series' astype() method, or use Pandas' to_numeric() method.
- Converting column type to integer in Pandas DataFrameTo convert the column type to integer in Pandas DataFrame, either use the Series' astype() method or use Pandas' to_numeric() method.
- Converting string categories or labels to numeric values in PandasTo encode the string labels or categories with numeric integers in Pandas, use the codes property of pd.Categorical(~).
- Encoding categorical variables in PandasTo encode categorical variables, either using one-hot encoding or dummy coding, use Pandas get_dummies(~) method.
- Expanding lists vertically in a DataFrame in PandasTo expand lists vertically in a Pandas DataFrame, use the DataFrame's explode(~) method.
- Expanding strings vertically in a DataFrame in PandasTo expand strings vertically in Pandas, use the DataFrame's explode(~) method.
- Extracting numbers from column in Pandas DataFrameTo extract numbers from column values in Pandas DataFrame, use str.extract('(\d+)').
- Filling missing value in Index of Pandas DataFrameTo fill missing entries in the index of a Pandas DataFrame, use the reindex(~) method.
- Filtering column values using boolean masks in Pandas DataFrameTo filter column values using boolean masks in Pandas DataFrame, use the Series' loc property.
- Logical AND operation in Pandas DataFrameUse & to perform a logical AND operation in Pandas DataFrame.
- Making DataFrame string column lowercase in PythonWe can making a DataFrame string column lowercase in Python using the str.lower() method.
- Mapping True and False to 1 and 0 respectively in Pandas DataFrameTo map booleans True and False to 1 and 0 respectively in Pandas DataFrame, perform casting using astype(int).
- Mapping values of a DataFrame using a dictionary in PandasTo map values of a Pandas DataFrame using a dictionary, use the DataFrame's replace(~) method.
- Modifying a single value in a Pandas DataFrameTo modify a single value in a Pandas DataFrame, use either iloc when using integer indices, or loc when using row and column labels.
- Removing characters from columns in Pandas DataFrameTo remove characters from columns in Pandas DataFrame, use the replace(~) method.
- Removing comma from column values in Pandas DataFrameTo remove comma from column values in Pandas DataFrame, use the Series' str.replace(~) method.
- Removing first n characters from column values in Pandas DataFrameTo remove the first n characters from column values from this Pandas DataFrame, use the vectorised str slicing approach.
- Removing last n characters from column values in Pandas DataFrameTo remove the last n characters from values from column A in Pandas DataFrame, use df["A"].str[:-1].
- Removing leading substring in Pandas DataFrameTo remove the leading substring 'ab', use the Series' str.replace(~) method.
- Removing trailing substring in Pandas DataFrameTo remove the trailing substring 'ab', use the Series' str.replace(~) method.
- Replacing infinities with another value in Pandas DataFrameTo replace infinities (np.inf) with another value in a Pandas DataFrame, use the replace(~) method.
- Replacing values in a DataFrame in PandasTo replace values in a Pandas DataFrame, use the DataFrame's replace(~) method.
- Rounding values in PandasTo round values in Pandas use the DataFrame.round(~) method. The method takes a single parameter which specifies the number of decimals to round to.
- Sorting categorical columns in Pandas DataFrameTo sort the DataFrame based on a categorical column, we first convert the column into a categorical type, and set the ordering using the second parameter. We then use sort_values(~) to perform the sort.
- Using previous row to create new columns in Pandas DataFrameTo use previous rows to create new columns, first make a Series using Pandas' shift(~) method, and then perform computation using this Series.
- Adding a prefix to column values in Pandas
Handling Missing Values
- Adding missing dates in Datetime Index in Pandas DataFrameTo add the missing dates in DatetimeIndex, replace the index with a new index using reindex(~).
- Checking if a DataFrame contains any missing values in PandasTo check if a Pandas DataFrame contains any missing values, use df.isna().any(axis=None).
- Checking if a certain value in a DataFrame is missing (NaN) in PandasTo check if a certain value in a Pandas DataFrame is missing (NaN), use the methods isna() and at.
- Converting a Pandas column with missing values to integer typeTo convert column A to integer type in Pandas DataFrame, use df['A'].astype('Int64').
- Counting non-missing values in PandasTo count non-missing values in rows or columns of a Pandas DataFrame use the count(~) method.
- Counting number of missing values (NaN) in each column of a Pandas DataFrameTo count the number of missing values (NaNs) of each column in a Pandas DataFrame, use df.isna().sum().
- Counting number of rows with missing values in Pandas DataFrameTo count the number of rows that contain at least one missing value in Pandas DataFrame, use df.isna().any(axis=1).sum(). To count the number of rows with all missing values in Pandas DataFrame, use df.isna().all(axis=1).sum().
- Counting the number of missing values (NaNs) in each row of a Pandas DataFrameTo count the number of missing values (NaNs) in each row of a Pandas DataFrame, use df.isna().sum(axis=1).
- Counting the total number of missing values (NaNs) of a Pandas DataFrameTo count the total number of missing values (NaNs) in a Pandas DataFrame, use df.isna().values.sum().
- Filling missing values using another column values in Pandas DataFrameTo fill the missing values in column A using values in column B in Pandas DataFrame, use df.loc[df["A"].isnull(), "A"] = df["B"].
- Filling missing values (NaNs) with the mean of the column in Pandas DataFrameTo fill missing values (NaNs) with the mean of the column in Pandas DataFrame, use df.fillna(df.mean()).
- Finding columns with missing values (NaNs) in Pandas DataFrameTo find columns with at least one NaN in a Pandas DataFrame, use df.isna().any(). To find columns that contain only NaN, use df.isna().all().
- Getting index of rows with missing values (NaNs) in Pandas DataFrameTo get the index of rows with missing values in Pandas DataFrame, use temp = df.isna().any(axis=1), and then temp[temp].index.
- Getting index of rows without missing values in Pandas DataFrameTo get the index of rows without missing values in a Pandas DataFrame, use df.dropna().index.
- Getting integer indexes of rows with NaN in Pandas DataFrameTo get the integer indexes of rows with missing values (NaN) in Pandas DataFrame, use either the isna(~) method or all(~) method along with NumPy's where(~) method.
- Getting rows with missing values (NaNs) in Pandas DataFrameTo get rows with missing values (NaNs) in a Pandas DataFrame, use df[df.isna().any(axis=1)].
- Getting rows with missing values (NaNs) in certain columns in Pandas DataFrameTo get rows with missing values in a specific column in Pandas DataFrame, use df[df[column_name].isna()].
- Mapping NaN values to 0 and non-NaN values to 1 in Pandas DataFrameTo map NaN values to 0 and non-NaN values to 1 in Pandas DataFrame, use the methods notnull() and astype("int").
- Mapping NaN values to False and non-NaN values to True in Pandas DataFrameTo map NaN values to False and non-NaN values to True in Pandas DataFrame, use the notnull() method.
- Removing columns where some rows contain missing values (NaNs) in Pandas DataFrameTo remove columns where some rows contain missing values (NaN), use the Pandas DataFrame's dropna(~) method.
- Removing rows from a DataFrame with missing values (NaNs) in PandasTo remove rows from a Pandas DataFrame whose value for a specific column is missing (NaN), use the DataFrame's dropna(~) method.
- Replacing NaN with blank string in Pandas DataFrameTo replace missing values (NaN) with a blank string in Pandas, use the DataFrame's fillna("") method.
- Replacing missing values (NaNs) for certain columns in Pandas DataFrameTo replace missing values (NaNs) present in certain columns, use the Pandas DataFrame's fillna(~) method.
- Replacing missing values (NaNs) with preceding values in Pandas DataFrameTo replace missing values (NaNs) with preceding values, use the Pandas DataFrame's fillna(method="ffill") method.
- Replacing all missing values (NaNs) of a Pandas DataFrameTo replace all missing values (NaNs) in a Pandas DataFrame, use the DataFrame's fillna(~) method.
- Replacing all NaN values with zeros in a Pandas DataFrameTo replace all missing values (NaN) with zeros in a Pandas DataFrame, use the fillna(~) method.
- Replacing missing values in Pandas DataFrameTo replace missing values (NaN) in Pandas DataFrame, use the fillna(~) method.
- Replacing missing values with constants in Pandas DataFrameTo replace missing values with a constant in Pandas DataFrame, use the fillna(~) method.
- Replacing values with NaNs in Pandas DataFrameTo replace values with NaNs, use the Pandas DataFrame's replace(~) method.
- Using interpolation to fill missing values (NaNs) in Pandas DataFrameTo fill missing values using interpolation in Pandas, use the DataFrame's interpolate(~) method.
- Adding missing dates in Datetime Index in Pandas DataFrame
Miscellaneous Cookbook
- Adjusting number of rows that are printed in Pandas DataFrameTo adjust the number of rows of a DataFrame that are printed in Pandas, use pd.set_option('display.max_rows', n) where n is the number of rows you want to show.
- Appending DataFrame to an existing CSV file in PandasTo append a DataFrame to an existing CSV file in Pandas, use the to_csv(~, mode="a") method.
- Checking differences between two indexes in PandasTo check the differences between two indexes in Pandas use the Index.difference(~) method.
- Checking if a DataFrame is empty in PandasTo check if a Pandas DataFrame is empty, use the DataFrame's empty property. An empty DataFrame is defined as those with no values in them.
- Checking if a variable is a DataFrame in PandasTo check if a variable is a DataFrame in Pandas, use the built-in isinstance(~) method.
- Checking if index is sorted in PandasTo check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.
- Checking if value exists in Index in Pandas DataFrameTo check if a value exists in the Index of a Pandas DataFrame, use the in keyword on the index property,
- Checking memory usage of DataFrame in PandasTo check the memory usage of a DataFrame in Pandas we can use the info(~) method or memory_usage(~) method. The info(~) method shows the memory usage of the whole DataFrame, while the memory_usage(~) method shows memory usage by each column of the DataFrame.
- Checking whether a Pandas object is a view or a copyTo check whether a Pandas object is a view or a copy, use the _is_view property.
- Concatenating a list of DataFrames in PandasTo concatenate a list of DataFrames in Pandas either vertically or horizontally, use the concat(~) method.
- Converting DataFrame to a list of dictionaries in PandasTo convert a DataFrame to a list of dictionaries in Pandas, use the DataFrame's to_dict(orient="records") method.
- Converting DataFrame to list of tuples in PandasTo convert a DataFrame df into a list of tuples in Pandas, use list(df.itertuples(index=False)).
- Converting a DataFrame to a Series in PandasTo convert a DataFrame into a Series, use the squeeze() method, which reduces a DataFrame with a single row or column to a Series.
- Converting a DataFrame to a list in PandasTo convert a Pandas DataFrame into a 2D standard Python list, use the values property followed by the tolist method.
- Counting the number of negative values in Pandas DataFrameTo count the total number of negative values in Pandas DataFrame df, call (df < 0).sum().sum().
- Creating a Pandas DataFrame using cartesian product of two DataFramesTo create a new Pandas DataFrame using the cartesian product of two DataFrames, use the method merge(~) with the parameter how='cross'.
- Displaying DataFrames side by side in PandasTo print DataFrames side-by-side in Pandas, set the inline option when calling display_html(~).
- Displaying full non-truncated DataFrame values in PandasTo display full non-truncated DataFrame values use the pd.set_option(~) method.
- Drawing frequency histogram of Pandas DataFrame columnTo draw a frequency histogram of a Pandas DataFrame, use the plt.hist of the matplotlib library.
- Exporting Pandas DataFrame to PostgreSQL tableTo connect with the PostgreSQL database, we must use the create_engine(~) method of the sqlalchemy library, and then use Pandas DataFrame's to_sql(~) method.
- Highlighting Pandas DataFrame cell based on value in Jupyter NotebookTo highlight a Pandas DataFrame cell based on value in Jupyter Notebook, use df.style.applymap(highlighter).
- Highlighting a particular cell of a DataFrame in PandasTo highlight a particular cell of a Pandas DataFrame, use the DataFrame's style.apply(~) method.
- How to solve "ValueError: If using all scalar values, you must pass an index" in PandasTo solve "ValueError: If using all scalar values, you must pass an index" in Pandas, either pass a list of values instead, or define an index.
- Importing BigQuery table as Pandas DataFrameTo import a BigQuery table as a DataFrame, Pandas offer a built-in method called read_gbq that takes in as argument a query string as well as a path to the JSON credential file for authentication.
- Plotting two columns of Pandas DataFrameTo plot two columns of Pandas DataFrame, import matplotlib and extract the columns as a Series.
- Printing DataFrame on a single line in PandasTo print a DataFrame on a single line instead of across multiple lines use the pd.set_option(~) method setting the 'expand_frame_repr' setting to False.
- Printing DataFrame without index in PandasTo print a DataFrame without the index in Pandas, use the to_string(~) method.
- Printing DataFrames in tabular format in PandasTo print DataFrames in tabular (table) format in Pandas, import the tabulate library and use the tabulate method.
- Randomly splitting DataFrame into multiple DataFrames of equal size in PandasTo randomly split a DataFrame into multiple DataFrames of equal size in Pandas, shuffle the rows using sample(~) method and then call NumPy's arraysplit(~) method.
- Reducing DataFrame memory size in PandasThere are two main ways to reduce DataFrame memory size in Pandas without necessarily compromising the information contained within the DataFrame: Use smaller numeric types Convert object columns to categorical columns
- Saving Pandas DataFrame as Excel fileTo save a Pandas DataFrame as an Excel file, use the DataFrame's to_excel(~) method.
- Saving Pandas DataFrame as feather fileTo save the DataFrame as a feather file, use df.to_feather("file_name.feather").
- Saving a Pandas DataFrame as a CSV fileTo save a Pandas DataFrame as a CSV file, use the DataFrame's to_csv(~) method.
- Setting all values to zero in Pandas DataFrameTo set all values to zero in Pandas DataFrame, use the iloc property like df.iloc[:] = 0.
- Showing all dtypes without truncation in Pandas DataFrameTo show all dtypes without truncation in Pandas DataFrame, use with pd.option_context('display.max_rows', None).
- Splitting DataFrame into multiple DataFrames based on value in PandasTo split a DataFrame into multiple DataFrames based on values in a column in Pandas, perform a groupby(~) on the column, and then call the methods tuple(~) and dict(~).
- Splitting DataFrame into smaller equal-sized Pandas DataFramesTo split this Pandas DataFrame into smaller equal-sized DataFrames, use NumPy's array_split(~) method.
- Writing Pandas DataFrame to SQLiteTo write a Pandas DataFrame to SQLite, use the sqlite3 library and use the pd.to_sql(~) method.
- Adjusting number of rows that are printed in Pandas DataFrame
Multi-index Operations Cookbook
- Combining multiple DataFrames into one DataFrame in PandasTo combine multiple DataFrames into a single DataFrame, use the pd.concat(~) method.
- Resetting MultiIndex of a DataFrame in PandasTo reset the multi-index of a DataFrame in Pandas, use the DataFrame's reset_index() method.
- Setting multi-index using two columns of a DataFrame in PandasTo set a multi-index to a DataFrame using two of its columns, use the DataFrame's set_index(~) method.
- Sorting a multi-index DataFrame in PandasTo sort a multi-index DataFrame, use the DataFrame's sort_index(~) method.
- Combining multiple DataFrames into one DataFrame in Pandas
Row and Column Operations Cookbook
- Adding a column that contains the difference of consecutive rows in Pandas DataFrameTo add a column that contains the difference of consecutive rows in Pandas DataFrame, use the diff(~) method.
- Adding a constant number to DataFrame columns in PandasWe can add a constant number to DataFrame columns in Pandas using the + operator or the add(~) method.
- Adding an empty column to a DataFrame in PandasTo add an empty column to a Pandas DataFrame, use the DataFrame's assign(~) method.
- Adding column to DataFrame with constant values in PandasTo add a column of constants in Pandas DataFrame, directly use the square bracket notation, [].
- Adding new columns to a DataFrame in PandasTo add new columns to a DataFrame in Pandas, use the DataFrame's assign(~) method.
- Appending rows to a Pandas DataFrameTo append a single row (a list or Series) or multiple rows to a Pandas DataFrame, use the DataFrame's append(~) method.
- Applying a function that takes as input multiple column values in PandasTo apply a function that takes as input multiple column values in Pandas, use the DataFrame's apply(~) method.
- Applying a function to a single column of a DataFrame in PandasTo apply a function to a single column in Pandas DataFrame, use [] syntax rather than using the apply method.
- Changing column type to categorical in PandasTo change column type to categorical in Pandas, use the DataFrame's astype("category") method.
- Changing the name of a DataFrame's index in PandasTo change the label of a specific index, use the Pandas DataFrame's rename method, or the index property.
- Changing the order of columns in a Pandas DataFrameTo change the order of two columns in Pandas DataFrame, either use [] syntax like df = df[["B","A","C"]], or use the reindex method.
- Changing the type of a DataFrame's column in PandasTo change the data type of a DataFrame's column in Pandas, use the Series' astype(~) method.
- Changing the type of a DataFrame's index in PandasTo change the type of a DataFrame's index in Pandas, use the DataFrame.index.astype(~) method.
- Checking if a DataFrame column contains some values in PandasTo check if a DataFrame column contains some values in Pandas, chain the methods isin(~) and any(~).
- Checking if a column exists in a DataFrame in PandasTo check if column A, exists in a DataFrame df in Pandas, use "A" in df.columns.
- Checking if a value exists in a DataFrame in PandasTo check if a value exists in the Pandas DataFrame, use the built-in in operator against the DataFrame's values property.
- Checking if column is numeric in Pandas DataFrameTo check if a column is numeric in a Pandas DataFrame, use df['A'].dtype.kind in 'iufc'.
- Checking the data type of columns in a Pandas DataFrameTo check the data type of columns in a Pandas DataFrame, use the DataFrame's dtypes property.
- Checking whether column values match or contain a pattern in Pandas DataFrameTo check whether column values match or contain a pattern in Pandas DataFrame, use the Series' str.contain(~) method.
- Combining two columns as a single column of tuples in PandasTo combine two columns as a single column of tuples in Pandas, use the DataFrame's apply(tuple) method.
- Combining two columns of type string in a Pandas DataFrameTo combine columns A and B of type string in Pandas DataFrame to form a new column C, use df["C"] = df["A"] df["B"].
- Computing the average of columns in Pandas DataFrameTo compute the average of columns in Pandas DataFrame, use the mean(~) method.
- Computing the correlation between columns in Pandas DataFrameTo compute the correlation between columns in Pandas DataFrame, use the corr(~) method.
- Concatenating DataFrames horizontally in PandasTo concatenate DataFrames horizontally in Pandas, use the concat(~) method with axis=1.
- Concatenating DataFrames vertically in PandasTo concatenate DataFrames vertically in Pandas, use the concat(~) method.
- Converting Index to list in PandasTo convert an Index to a Python list in Pandas, use Index's tolist() method.
- Converting a row to column labels in PandasTo convert a row to column labels in Pandas DataFrame, directly assign the DataFrame's columns property to the reference of the row (Series).
- Converting categorical type to int in Pandas DataFrameTo change the column type from category to int in Pandas DataFrame, use the factorize(~) method.
- Converting column to list in PandasTo convert a column into a Python list in Pandas, either use Series' to_list(~) method or use Python's built-in list(~) method.
- Converting percent strings into numeric in Pandas DataFrameTo convert percent strings into numeric type in Pandas DataFrame, first strip the trailing % character and then perform type conversion using astype(float).
- Converting the index of a DataFrame into a columnTo convert the index of a Pandas DataFrame into a column, use the DataFrame's reset_index() method.
- Counting duplicate rows in Pandas DataFrameTo count the number of duplicate rows in a Pandas DataFrame, use the DataFrame's duplicated(~) method.
- Counting number of rows with no missing values in Pandas DataFrameTo count the number of rows with no missing values in a Pandas DataFrame, use the combination of methods notna, all and sum.
- Counting the occurrence of values in columns of a Pandas DataFrameTo count the occurrence of a specific value in a column of a Pandas DataFrame, first obtain a boolean mask and then use the sum method to add up all the boolean Trues.
- Counting unique values in a column of a Pandas DataFrameTo count the number of unique values in a column of a DataFrame in Pandas, use the nunique(~) method.
- Counting unique values in rows of a Pandas DataFrameTo count the number of unique values in a row of a DataFrame in Pandas, use the nunique(~) method.
- Creating a new column based on other columns in Pandas DataFrameTo create a new column based on other columns for Pandas DataFrame, either use column-arithmetics for fastest performance or use assign method for complicated operations.
- Creating new column using if, elif and else in Pandas DataFrameTo create new columns using if, elif and else in Pandas DataFrame, use either the apply method or the loc property.
- Describing certain columns of a DataFrame in PandasTo describe certain columns, as opposed to all columns, in Pandas DataFrame, use the [] notation to first extract the desired columns and then use the describe(~) method.
- Dropping columns whose label contains a substring in PandasTo drop columns whose label contains a specific substring in Pandas DataFrame, use df.loc[:, ~df.columns.str.contains(substring)].
- Getting column values based on another column values in a DataFrame in PandasTo get column values based on another column values in Pandas DataFrame, use the query(~) method and then extract the desired columns.
- Getting columns as a copy in Pandas DataFrameTo get columns as a copy in Pandas DataFrame, use the copy(~) method.
- Getting columns whose label contains a substring in PandasTo get columns whose label contains a substring in Pandas, use the DataFrame's filter(~) method.
- Getting maximum value in columns of Pandas DataFrameTo get the maximum value in columns in Pandas DataFrame, use the max(~) method.
- Getting maximum value of entire DataFrame in PandasTo get the maximum value of entire DataFrame df in Pandas, use df.max().max().
- Getting mean of columns in Pandas DataFrameTo get the mean of columns in Pandas DataFrame, use the mean(~) method.
- Getting median of columns in Pandas DataFrameTo get the median of columns in Pandas DataFrame, use the median(~) method.
- Getting minimum value in columns in Pandas DataFrameTo get the minimum value of columns in Pandas DataFrame, use the min() method.
- Getting row label when calling apply in PandasTo get the row label when calling apply in Pandas, use the name property of the row.
- Getting row labels as list in Pandas DataFrameTo get the row labels as a list in Pandas, use list(df.index) where df is the DataFrame.
- Getting rows where column value contains any substring in a list in Pandas DataFrameTo get all rows where the string in a column contains certain substrings in Pandas DataFrame, construct a regular expression using the pipeline character, and then use the str.contains(~) method.
- Getting the name of a DataFrame's index in PandasTo get the name of a DataFrame's index in Pandas, use DataFrame.index.name.
- Getting the type of DataFrame's indexTo get the type of the Pandas DataFrame's index, use DataFrame.index.dtype.
- Grouping DataFrame rows into lists in PandasTo pack the values in one column (A) into a list for each group present in another column (B) in Pandas DataFrame, use df.groupby('A')["B"].agg(list), where df is the source DataFrame.
- Inserting column at a specific location in PandasTo insert a column at a specific location in Pandas, use the DataFrame's insert(~) method.
- Iterating over each column of a DataFrame in PandasTo iterate over each column of a DataFrame in Pandas, use the DataFrame's iteritems() method, which returns an iterator over the column labels and column values.
- Iterating over each row of a DataFrame in PandasTo iterate over each row of a DataFrame in Pandas, use the DataFrame's iterrows() method, which returns an iterator over the row labels and row values.
- Modifying rows of a Pandas DataFrameTo modify a row of a Pandas DataFrame, either use the DataFrame's loc or iloc property.
- Modifying values in Index of Pandas DataFrameTo modify values in the Index of a Pandas DataFrame, use the rename(~) method or perform direct assignment.
- Removing columns from a DataFrame in PandasTo remove a column from a DataFrame in Pandas, use the DataFrame's drop(~) method.
- Removing columns using column labels in Pandas DataFrameTo remove columns using column labels in Pandas DataFrame, use the drop(~) method.
- Removing columns using integer index in Pandas DataFrameTo drop columns using integer index in Pandas DataFrame, call the drop(~) method.
- Removing columns with all missing values in Pandas DataFrameTo remove columns with all missing values in Pandas, use the DataFrame's dropna(how="all", axis=1) method.
- Removing columns with some missing values in Pandas DataFrameTo remove columns with some missing values in Pandas DataFrame, use the dropna(axis=1) method.
- Removing duplicate columns in Pandas DataFrameTo remove duplicate columns in Pandas DataFrame, use the duplicated(~) method.
- Removing duplicate rows in Pandas DataFrameTo remove duplicate rows from a Pandas DataFrame, use the drop_duplicates(~) method.
- Removing first n rows of a DataFrame in PandasTo remove the first n rows of a DataFrame in Pandas, use the iloc property.
- Removing multiple columns in Pandas DataFrameTo remove multiple columns in Pandas DataFrame, use the drop(~) method.
- Removing prefix from column labels in Pandas DataFrameTo remove prefix from column labels in Pandas DataFrame, use the str.lstrip(~) method.
- Removing rows at random without shuffling in Pandas DataFrameTo remove rows at random without shuffling in Pandas DataFrame, first get an array of randomly selected row index labels, and then use the drop(~) method to remove the rows.
- Removing rows from a DataFrame based on column values in PandasTo remove rows from a Pandas DataFrame based on column values, use the DataFrame's query(~) method.
- Removing rows using integer index in Pandas DataFrameTo remove rows using integer index in Pandas DataFrame, first get the name of the row index using iloc, and second use the drop(~) method to remove the row.
- Removing rows with all zeros in Pandas DataFrameTo remove rows with all zeros in Pandas DataFrame, use df[~(df == 0).all(axis=1)] where df is the DataFrame.
- Removing suffix from column labels in Pandas DataFrameTo remove suffix from column labels in Pandas DataFrame, use the str.rstrip(~) method.
- Renaming columns of a DataFrame in PandasTo rename the columns of a DataFrame in Pandas, use either the rename(~) method or the columns property.
- Replacing substring in column values in Pandas DataFrameTo replace substrings in column values in Pandas DataFrame, use the Series' str.replace(~) method.
- Returning multiple columns using the apply function in PandasTo return multiple columns using the apply(~) function in Pandas, make the parameter function return a Series.
- Reversing the order of rows in Pandas DataFrameTo reverse the order of rows in Pandas DataFrame, use iloc[::-1].
- Setting a new index of a DataFrame in PandasTo replace the index of a DataFrame in Pandas, directly assign a new array to the index property or use the DataFrame's set_index method.
- Setting an existing column as the new index of a Pandas DataFrameTo set an existing column as the new index of a Pandas DataFrame, use the set_index(~) method.
- Setting column as the index in Pandas DataFrameTo set a column as the index of a Pandas DataFrame, use the set_index(~) method.
- Setting integers as column labels in Pandas DataFrameTo set the column labels to be incremental integers in Pandas DataFrame, use df.columns = range(0, df.columns.size).
- Showing all column labels in Pandas DataFrameTo show all column labels of a Pandas DataFrame, access the values property of columns.
- Shuffling the rows of a Pandas DataFrameTo shuffle the rows of a Pandas DataFrame, use the DataFrame's sample(frac=1) method.
- Sorting Pandas DataFrame alphabeticallyTo sort a Pandas DataFrame alphabetically, use sort_values(~).
- Sorting DataFrame by column labels in PandasTo sort a DataFrame by column labels in Pandas, use the DataFrame's sort_index(axis=1) method.
- Sorting a Pandas DataFrame by columnTo sort a Pandas DataFrame by column, use the DataFrame's sort_values(~) method.
- Sorting a DataFrame by index in PandasTo sort a DataFrame by index in Pandas, use the DataFrame's sort_index(~) method.
- Splitting a column of strings into multiple columns in PandasTo split a column of strings into multiple columns in Pandas DataFrame, use Series' split string method.
- Splitting column of lists into multiple columns in PandasTo split a column, which contains lists, into multiple columns, use pd.concat([df, df["A"].apply(pd.Series)], axis=1).
- Splitting dictionary into separate columns in Pandas DataFrameTo split dictionaries into separate columns in Pandas DataFrame, use the apply(pd.Series) method.
- Stripping substrings from values in columns in PandasTo strip substrings from values in a column of a Pandas DataFrame, use the str.strip(~) helper method.
- Stripping whitespace from columns in PandasTo strip whitespace from columns in Pandas we can use the str.strip(~) method or the str.replace(~) method.
- Stripping whitespaces in column labels in Pandas DataFrameTo strip whitespaces in column labels in Pandas DataFrame, use the str.strip() method.
- Summing a column of a DataFrame in PandasTo compute the sum of columns in Pandas, use the DataFrame's sum(~) method.
- Summing rows of specific columns in PandasTo sum rows of specific columns of a Pandas DataFrame, say columns A and C, call df["A"] + df["C"] where df is the DataFrame.
- Swapping the rows and columns of a DataFrame in PandasTo swap the rows and columns of a DataFrame in Pandas, use the DataFrame's transpose(~) method.
- Unstacking certain columns only in Pandas DataFramePandas' unstack(~) method does not allow you to select specific columns to unstack. To unstack certain columns only, use the Pandas melt(~) method.
- Updating a row while iterating over the rows of a DataFrame in PandasTo update a row while iterating over the rows of a DataFrame in Pandas, use the itertuples(~) method along with the at property for value assignment.
- Updating rows based on column values in Pandas DataFrameTo update rows based on column values in Pandas DataFrame, use the loc property.
- Using apply method in parallel to Pandas DataFrameTo run Pandas' apply(~) in parallel, use Dask, which is an easy-to-use library that performs Pandas' operations in parallel by splitting up the DataFrame into smaller partitions.
- Adding a column that contains the difference of consecutive rows in Pandas DataFrame
Selecting Data Cookbook
- Accessing a single value of a DataFrame in PandasTo access a single value of a DataFrame in Pandas, use the DataFrame.iloc property (via integer indices), or use the DataFrame.loc property (via row and column labels).
- Accessing columns of a DataFrame using column labels in PandasTo access specific columns of a Pandas DataFrame with their columns labels, directly use DataFrame[~] or use the DataFrame.loc property.
- Accessing columns of a DataFrame using integer indices in PandasTo access columns of a DataFrame using integer indices in Pandas, use the DataFrame.iloc property.
- Accessing rows of a DataFrame using integer indices in PandasTo access rows of a DataFrame using integer indices in Pandas, use DataFrame.iloc property.
- Accessing rows of a DataFrame using row labels in PandasTo access specific rows of a Pandas DataFrame with row labels, use the DataFrame.loc property.
- Accessing the first n rows of a Pandas DataFrameTo access the first n rows of a Pandas DataFrame, use either the DataFrame's head(n) method or iloc property.
- Accessing the last n rows of a Pandas DataFrameTo access the last n rows of a Pandas DataFrame, use the DataFrame's tail(n) method.
- Accessing values of a multi-index DataFrame in PandasTo access values of a multi-index DataFrame, use the loc property.
- Adding prefix to column labels in Pandas DataFameTo prepend a prefix to column labels in Pandas, use the DataFrame's add_prefix(~) method.
- Adding suffix to column labels in Pandas DataFrameTo append a suffix to column labels in Pandas, use the DataFrame's add_suffix(~) method.
- Converting two columns into a dictionary in Pandas DataFrameTo convert two columns into a dictionary in Pandas DataFrame, first extract the two columns as Series, and then pass them into dict(zip(~)).
- Excluding columns based on type in Pandas DataFrameTo exclude columns based on the data type in Pandas DataFrame, use the DataFrame's select_dtypes(~) method with the exclude parameter.
- Extracting values of a DataFrame as a Numpy array in PandasTo extract all values of a DataFrame as a Numpy array, use the DataFrame's values property or the to_numpy(~) method.
- Getting a list of all the column labels of a Pandas DataFrameTo get the column labels of a Pandas DataFrame as a Python standard list, use list(df.columns).
- Getting all columns except one in Pandas DataFrameTo get all columns except one in Pandas, use the DataFrame's drop(~) method.
- Getting all duplicate rows in PandasTo get all duplicate rows as a Pandas DataFrame, use the DataFrame's duplicated(~) method.
- Getting all numeric columns of a DataFrame in PandasTo get all numeric columns of a Pandas DataFrame, use the select_dtypes(~) method.
- Getting all unique values of columns in PandasTo get all unique values of certain columns in a DataFrame, use Pandas' unique(~) method.
- Getting column label of max value in each row in Pandas DataFrmeTo get the column label of the max value in each row of a Pandas DataFrame, use the idxmax(axis=1) method.
- Getting column label of minimum value in each row in Pandas DataFrameTo get the column label of the minimum value in each row in Pandas DataFrame, use the idxmin(axis=1) method.
- Getting columns by data type in Pandas DataFrameTo get columns by data type in Pandas, use the DataFrame's select_dtypes(~).
- Getting columns using integer index in Pandas DataFrameTo get columns using integer index in Pandas DataFrame, use the iloc property.
- Getting earliest or latest date from Pandas DataFrameTo get the earliest or latest date from Pandas DataFrame, use min(~) and max(~).
- Getting every nth row in Pandas DataFrameTo get every nth row in Pandas DataFrame, use df.iloc[::n, :].
- Getting first row value of a column in Pandas DataFrameTo get the first row value of a column in Pandas DataFrame, use the iloc property.
- Getting index of Series where value is TrueTo get the index of values that equal True in Pandas, use s[s].index where s is a Series.
- Getting indexes of rows matching conditions in Pandas DataFrameTo get the indexes of row matching a certain condition in Pandas DataFrame, use the query(~) method to perform filtering, and then fetch the corresponding index using index.
- Getting integer index of a column using its column label in PandasTo get the integer index of a column using its column label of a Pandas DataFrame, use the get_loc method of the Index object.
- Getting integer index of rows based on column values in Pandas DataFrameTo get the integer indexes of rows based on column values in Pandas DataFrame, use NumPy's where(~) method.
- Getting multiple columns in Pandas DataFrameTo get multiple columns in Pandas DataFrame, use either the [] syntax directly or use properties like loc and iloc.
- Getting number of columns of a Pandas DataFrameTo get the total number of columns of a Pandas DataFrame, use the shape property, which returns a tuple containing the number of rows and columns.
- Getting row with largest index value in Pandas DataFrameTo get rows with the largest index value in Pandas DataFrame df, use df.iloc[df.index.argmax()].
- Getting row with smallest index value in Pandas DataFrameTo get rows with the smallest index value in Pandas DataFrame df, use df.iloc[df.index.argmin()].
- Getting rows based on multiple column values in Pandas DataFrameTo get rows based on multiple column values in Pandas DataFrame, use the query(~) method.
- Getting rows except some in Pandas DataFrameTo get all rows in a Pandas DataFrame except those at certain integer indexes or with certain labels, use the drop(~) method.
- Getting rows from a DataFrame based on column values in PandasTo get rows from a Pandas DataFrame based on column values, use the DataFrame's query(~) method.
- Getting rows that are not in other DataFrame in PandasTo get rows from a DataFrame that are not in another DataFrame in Pandas, perform a left join on all the columns using merge(~), and then use the query(~) method to extract the non-matched rows.
- Getting rows using OR statement in Pandas DataFrameTo get rows using OR statement in Pandas DataFrame, use the query(~) method.
- Getting rows where column values are of specific length in Pandas DataFrameTo get rows where column values have a specific length (n) in Pandas DataFrame, use df.query("A.str.len() == 3", engine="python").
- Getting rows where value is between two values in Pandas DataFrameTo get rows where value is between two values in Pandas DataFrame, use the query(~) method.
- Getting rows where values do not contain substring in Pandas DataFrameTo get rows where values do not contain a substring in Pandas DataFrame, use str.contains(~) with the negation operator ~.
- Getting shortest and longest strings in Pandas DataFrameTo get the shortest or longest strings in Pandas DataFrame column, first compute the shortest or longest string length using Series.str.len() method, and then obtain a boolean mask to filter the shortest or longest strings.
- Getting the column labels of a DataFrame in PandasTo get the column labels of a DataFrame in Pandas, use the DataFrame.columns property.
- Getting the first column in Pandas DataFrameTo get the first column in Pandas DataFrame, use the iloc property.
- Getting the index of a DataFrame in PandasTo get the index of a DataFrame in Pandas, use the DataFrame.index property.
- Getting the length of the longest string in a column in Pandas DataFrameTo get the length of the longest string in a column (A) of a Pandas DataFrame (df), use df["A"].str.len().max().
- Getting the longest string in a column in Pandas DataFrameTo get the longest string in a column in Pandas DataFrame, first get the length of each string using the str.len() method, and then use NumPy's where(~) method to get the integer indexes of the maximums. Finally, pass this integer indexes into iloc to get the desired rows.
- Getting the row with the maximum column value in Pandas DataFrameTo get the row with the maximum column value in Pandas, use the DataFrame's nlargest(~) method.
- Getting the row with the minimum column value in Pandas DataFrameTo get the row with the smallest column value in Pandas, use the DataFrame's nsmallest(~) method.
- Getting the shape of a DataFrame in PandasTo get the shape of a DataFrame in Pandas, use the DataFrame.shape property.
- Getting the total number of rows of a Pandas DataFrameTo get the total number of rows of a Pandas DataFrame, use the shape property, which is a tuple containing the number of rows and columns.
- Getting the total number of values in a Pandas DataFrameTo get the total number of values in a Pandas DataFrame, use the DataFrame's size property.
- Making column labels all lowercase in Pandas DataFrameTo make column labels of a Pandas DataFrame all lowercase, use df.columns = df.columns.str.lower() where df is the DataFrame.
- Making column labels all uppercase in Pandas DataFrameTo make column labels of a Pandas DataFrame all uppercase, use df.columns = df.columns.str.upper() where df is the DataFrame.
- Randomly select rows based on a condition from a Pandas DataFrameTo randomly select rows based on a specific condition in Pandas, first use DataFrame.query(~) method to extract rows that meet the condition, and then use DataFrame.sample(~) method to randomly select n rows.
- Randomly selecting n columns from a DataFrameTo randomly select n columns from a Pandas DataFrame, use the DataFrame's sample(~) method.
- Randomly selecting n rows from a DataFrameTo select n rows from a DataFrame randomly, use the DataFrame's sample(~) method.
- Reassigning column values in Pandas DataFrameTo reassign column values in Pandas DataFrame, use the [] syntax to get a view of the column, and then perform the assignment using =.
- Retrieving DataFrame column values as a NumPy arrayWe can use the DataFrame.values property to return the values of a DataFrame column as a NumPy array.
- Selecting a single column as a Pandas DataFrameTo get a column as a Pandas DataFrame instead of a Series, use df.loc[:,['A']].
- Selecting columns of a DataFrame using regex in PandasWe can select columns of a DataFrame using regex through the filter(~) method. The method applies filtering based on the labels of the columns/rows, and not on the actual data.
- Selecting columns that do not begin with certain prefix in Pandas DataFrameTo select columns that do not begin with a certain prefix in Pandas DataFrame, first get the positions of the columns that do not begin with the specified prefix using columns.str.startwith(~), and then use iloc to select those columns.
- Selecting columns with certain prefix in Pandas DataFrameTo select columns that begin with a certain prefix in Pandas DataFrame, first fetch the locations of the columns that start with a prefix using columns.str.startwith(~), and then use iloc to select those columns.
- Selecting last column of Pandas DataFrameTo select the last column of Pandas DataFrame, use df[df.columns[-1]].
- Selecting n rows with the smallest values for a column in PandasTo select the n rows with the smallest values for a column in Pandas, use the DataFrame's nsmallest(~) method.
- Selecting rows based on a condition in PandasTo extract rows from a DataFrame using a boolean mask in Pandas, simply use the [~] notation like df[mask].
- Selecting rows based on dates in PandasIn Pandas, to select rows based on dates, use DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values are NOT contained in a listTo select rows from a Pandas DataFrame whose column values are NOT contained in a list, use the DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values are contained in a listTo select rows from a Pandas DataFrame whose column values are contained in a list, use the DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values contain a substringTo select rows from a Pandas DataFrame where certain column values contain a specific substring, use the column's str.contains(~) method.
- Selecting rows starting with substring in Pandas DataFrameTo select rows of a Pandas DataFrame starting with a specified substring we can use the str.startswith(~) method.
- Selecting top n rows with the largest values for a column in PandasTo select the top n rows with the largest values in a column in Pandas, use the DataFrame's nlargest(~) method.
- Splitting Pandas DataFrame based on column valuesTo split a Pandas DataFrame based on column values, first build a mask of booleans that indicate rows where condition is satisfied. Next, use df[mask] and df[~mask] to obtain two separate DataFrames.
- Accessing a single value of a DataFrame in Pandas
Time Series Cookbook
- Adding new column containing the difference between two date columns in Pandas DataFrameTo add a new column containing the difference between two date columns in Pandas DataFrame, perform date arithmetics on the columns using the standard binary operators.
- Combining columns containing date and time in PandasTo combine columns containing date and time in Pandas DataFrame, use string concatenation and then use to_datetime(~) to convert the datetime string to type datetime64.
- Combining columns of years, months and days in PandasTo combine the Year, Month and Day columns to form another column in Pandas DataFrame, perform string concatenation and then use the to_datetime(~) method.
- Converting DatetimeIndex to Series of datetime in PandasTo convert DatetimeIndex into a Series of type datetime64 in Pandas, pass it into the Series constructor.
- Converting UNIX timestamp to datetime in PandasTo convert UNIX timestamp to datetime, use Pandas to_datetime(~) method.
- Converting a DataFrame column of strings to datetime in PandasTo convert a DataFrame column of strings to datetime, use Pandas to_datetime(~) method.
- Converting dates to strings in Pandas DataFrameTo convert a date column to a Series of date strings in Pandas DataFrame, use the strftime(~) method.
- Converting datetime column to date and time columns in PandasIn Pandas, we first convert the datetime column into strings holding date and time information using strftime(~), and then use Series.str.split(~) to split the string into date and time columns.
- Converting index to datetime in PandasTo convert the index of a DataFrame to DatetimeIndex, use Pandas' to_datetime(~) method.
- Creating a column of dates in PandasTo create a column of dates in Pandas, use the date_range(~) method.
- Creating a range of dates in PandasTo create a range of dates in Pandas, use the date_range(~) method, which returns a DatetimeIndex.
- Extracting month and year from Datetime column in PandasTo extract the year and month of a Datetime column in Pandas, use dt.year and dt.month respectively.
- Getting all weekdays between two dates in PandasTo get all weekdays between two dates, use Pandas bdate_range(~) method.
- Getting all weekends between two dates in PandasTo get all weekends between two dates, use Pandas bdate_range(~) method.
- Getting day of week of date columns in Pandas DataFrameTo get the day of week for each date in column in Pandas DataFrame, use dt.day_name().
- Getting day unit from date column in PandasTo get the day unit from a date column in Pandas DataFrame, use the dt.day property.
- Getting month unit from date column in PandasTo get the month unit from a date column in Pandas DataFrame, use the dt.month property.
- Getting name of months in date column in PandasTo get the name of months in date column in Pandas DataFrame, use dt.month_name().
- Getting week numbers from a date column in PandasTo get the week numbers from a date column in Pandas DataFrame, use the Series' dt.isocalendar().week property.
- Getting year unit from date column in PandasTo get the year unit from a date column in Pandas DataFrame, use the dt.year property.
- Modifying dates in PandasTo modify dates (Timestamp and datetime) in Pandas, use the replace(~) method.
- Offsetting datetime in PandasTo offset datetime in Pandas, initialise a Timedelta object and perform date arithmetics.
- Removing time unit from dates in PandasTo remove the time unit from dates in Pandas, use the normalize() method.
- Setting date to beginning of month in PandasTo set the dates in a column to the beginning of the month in Pandas DataFrame, use astype("datetime64[M]").
- Sorting DataFrame by dates in PandasTo sort by dates in Pandas DataFrame, use either sort_values(~) or sort_index(~).
- Using dates as the index of a DataFrame in PandasTo use dates as the index of a DataFrame in Pandas, you can either use Pandas' date_range(~) method, which returns a DatetimeIndex, or use to_datetime(~) to convert an existing index to DatetimeIndex.
- Adding new column containing the difference between two date columns in Pandas DataFrame
Series Cookbook
- Appending values to a Series in PandasTo append a single value in Pandas Series, perform direct assignment using [] syntax. To append multiple values, use the Pandas' concat method.
- Applying a function to Series in PandasTo apply a function to a Pandas Series, use the apply(~) method.
- Binning values in a Pandas SeriesTo bin the values of a Series, use the Pandas' cut(~) method.
- Changing data type of Series in PandasTo change the data type of a Series in Pandas, use the astype(~) method.
- Checking if Series has missing values in PandasTo check if a Series has missing values in Pandas, use the hasnans property.
- Checking if a value is NaN in Pandas SeriesTo check if some value is NaN (missing) in a Pandas Series, use the isna(~) method.
- Checking if all values are NaN in Pandas SeriesTo check if all values in a Pandas Series are missing (NaN), chain the methods isnull and all.
- Checking if all values in Series are unique in PandasTo check if all the values in a Series is unique in Pandas, use the Series's is_unique property.
- Converting Python list to Pandas SeriesTo convert a Python list into a Pandas Series, directly pass the list into the Series constructor.
- Converting Series of lists into DataFrame in PandasTo horizontally expand the lists and convert the Series into a Pandas DataFrame, first convert the Series into a Python list, and then pass this into the DataFrame constructor.
- Converting Pandas Series to Python listTo convert a Pandas Series to Python list, either use to_list(~) or Python's built-in list(~).
- Converting Series to a Numpy arrayTo convert a Series to a NumPy array in Pandas, use to_numpy(~) method.
- Counting frequency of values in Pandas SeriesTo count the frequency of values in a Pandas Series, use the value_counts(~) method.
- Creating a Series of zeroes in PandasTo create a Series with zeros in Pandas, directly call the Series constructor.
- Creating a Series with constant value in PandasTo create a Series with constant value in Pandas, directly call the Series constructor.
- Filtering strings based on length in Pandas SeriesTo filter strings based on length in Pandas Series, first get the length of each string using str.length method, and then create a boolean mask of strings that have the length specified. Finally, pass in this mask into loc property to fetch the corresponding entries in the Series.
- Filtering values of a Series in PandasTo filter values of a Series in Pandas, pass in a filter function to the loc property.
- Getting frequency counts of values in intervals in Pandas SeriesTo get the frequency counts of values that fall under some intervals, first use Pandas' cut(~) method to partition the values into bins (segments), and then use value_counts(~) to get the corresponding frequency counts.
- Getting index of largest value in Pandas SeriesTo get the index of the largest value in a Pandas Series, use the idxmax(~) method.
- Getting index of smallest value in Pandas SeriesTo get the index of the largest value in a Pandas Series, use the idxmin(~) method.
- Getting index of value in Series in PandasTo get the index of a value in a Pandas Series, directly use the [] syntax. To get the integer index of a value in Pandas Series, first convert the Series into an Index and then use the get_loc method.
- Getting integer index of largest value in Pandas SeriesTo get the integer index of the largest value in a Pandas Series, use the argmax(~) method.
- Getting integer index of smallest value in Pandas SeriesTo get the integer index of the smallest value in a Pandas Series, use the argmin(~) method.
- Getting integer index of value in Pandas SeriesTo get the integer index of values in a Pandas Series, use NumPy's where(~) method.
- Getting intersection of Series in PandasTo get the intersection of two Series as a Series in Pandas, use NumPy's intersect1d(~) method.
- Getting length of each string in Pandas SeriesTo get the length of each string in Pandas Series, use the str.len() method.
- Getting list of integer indices where value is boolean True in Pandas SeriesTo get a list of integer indices where value is True in Pandas Series, use NumPy's where(~) method.
- Getting the index of the nth value in Pandas SeriesTo get the index of the value located at the i-th integer index in Pandas Series, fetch the index using the index property, and then use square bracket notation to get the index.
- Getting the most frequent value in Pandas SeriesTo get the most frequent value in the Pandas Series, use the mode method. To get the most frequent value and its count in the Pandas Series, use the value_counts method.
- Getting value of Series using integer index in PandasTo get a value from a Pandas Series using integer index, use the iloc property.
- Grouping Series by its values in PandasTo group the values of a Pandas Series, use the groupby(~) method.
- Handling error - "Truth value of a Series is ambiguous" in Pandas'The fix is to avoid the use of "and" and "or" in Pandas, and instead opt to use bitwise operators "&" and "|", respectively.
- Inverting a Series of booleans in PandasTo invert a Series of boolean in Pandas, use ~.
- Removing missing values from a Series in PandasTo remove missing values from a Series in Pandas, use the dropna() method.
- Removing substrings from strings in a Series in PandasTo remove a substring from each string in a Pandas Series, use the str.replace(~) method.
- Removing values from Series in PandasTo remove values from a Pandas Series, either use the drop method or create a boolean mask and pass it into the loc property.
- Resetting index of Series in PandasTo reset the index of Series in Pandas, use the reset_index(~) method.
- Sorting values in a Pandas SeriesTo sort the values in a Pandas Series, use the sort_vales(~) method.
- Splitting strings based on space in Pandas DataFrameTo split strings based on space in Pandas, use the Series.str.split(~) method.
- Stripping leading and trailing whitespace in Pandas SeriesTo strip leading and trailing whitespace from each string in a Pandas Series, use the Series' str.strip(~) method.
- Taking the floor or ceiling of values in Series in PandasTo take the floor or ceiling of values in Pandas Series, use NumPy's floor(~) and ceil(~) methods.
- Using index.get_loc(~) for multiple values in PandasTo get the integer indexes of multiple values in the Pandas DataFrame's index, use df.index.get_indexer(~).
Published by Isshin Inada
Edited by 0 others
Did you find this page useful?
thumb_up
thumb_down
Comment
Citation
Ask a question or leave a feedback...
thumb_up
4
thumb_down
1
chat_bubble_outline
0
settings
Enjoy our search
Hit / to insta-search docs and recipes!