Creating DataFrames Cookbook
- Combining multiple Series into a DataFrame in Pandas
  To combine multiple Series into a single DataFrame in Pandas, use the concat(~) method or use the DataFrame's constructor.
- Combining multiple Series to form a DataFrame in Pandas
  To combine multiple Series to form a DataFrame in Pandas, use the concat(~) method.
- Converting a Series to a DataFrame in Pandas
  To convert a Series to a DataFrame in Pandas, use the the Series' to_frame(~) method.
- Converting list of lists into Pandas DataFrame
  To convert list of lists into Pandas DataFrame, call pass in the list into the DataFrame(~) constructor.
- Converting list to Pandas DataFrame
  To convert a list into a Pandas DataFrame, pass the list directly into the DataFrame constructor.
- Converting percent string into a numeric for read_csv in Pandas
  To convert percent string into numeric when using read_csv(~) in Pandas, supply the appropriate converters parameter.
- Converting scikit-learn dataset to Pandas DataFrame
  To convert a scikit-learn dataset to Pandas DataFrame, use the DataFrame constructor.
- Converting string data into a DataFrame in Pandas
  To read data given in string form as a Pandas DataFrame, use read_csv(~) along with StringIO.
- Creating a DataFrame from a string in Pandas
  To create a DataFrame from a string in Pandas, use StringIO(~) to convert the string into a file-like object, and then use read_csv(~) to read this object and parse it into a DataFrame.
- Creating a DataFrame using lists in Pandas
  To create a DataFrame using lists in Pandas, directly pass the lists into the DataFrame constructor.
- Creating a DataFrame with different type for each column in Pandas
  To create a DataFrame with different type for each column in Pandas, specify the dtype parameter of the DataFrame constructor.
- Creating a DataFrame with empty values in Pandas
  To create a DataFrame with empty values (NaN) in Pandas, call the DataFrame constructor with parameters columns and index.
- Creating a Pandas DataFrame with missing values
  To create a Pandas DataFrame with missing values, pass in np.nan to the DataFrame constructor and supply the index and columns parameter.
- Creating a DataFrame with random numbers in Pandas
  To initialise a DataFrame containing random numbers in Pandas, use one of NumPy's functions to generate an array of random numbers, and then pass that into the DataFrame constructor.
- Creating a DataFrame with zeros in Pandas
  To create a DataFrame with zeros in Pandas, pass in the value 0 to the DataFrame constructor and supply the parameters index and columns.
- Creating a MultiIndex DataFrame in Pandas
  To create a MultiIndex DataFrame in Pandas, first create a MultiIndex object using methods from_tuples, from_arrays or from_product, and then pass this object directly to the DataFrame constructor.
- Creating a Pandas DataFrame
  To create a Pandas DataFrame, use the DataFrame(~) constructor.
- Creating a single DataFrame from multiple files in Pandas
  To create a single DataFrame from multiple files in Pandas, read the files individually using read_csv(~), and then concatenate them using concat(~).
- Creating empty DataFrame with only column labels in Pandas
  To create an empty DataFrame with only column labels in Pandas, supply only the columns parameter in the DataFrame constructor.
- Filling missing values when using read_csv in Pandas
  To fill missing values when reading a file in Pandas, call the fillna(~) method after calling read_csv(~).
- Importing Dataset in Pandas
  To import and read a CSV file as a DataFrame, use Pandas read_csv(~) method.
- Comprehensive guide on importing tables from PostgreSQL as Pandas DataFrames
  To read a PostgreSQL table as a Pandas DataFrame, first establish a connection to the server using sqlalchemy, and then use Pandas' read_sql(~) method to create a DataFrame.
- Initialising a DataFrame using a constant in Pandas
  To initialise a DataFrame using a single constant in Pandas, pass the constant into a DataFrame constructor and specify the parameters columns and index.
- Initialising a DataFrame using a dictionary in Pandas
  To initialise a DataFrame using a dictionary in Pandas, pass it directly into the DataFrame constructor.
- Initialising a Pandas DataFrame using a list of dictionaries
  To initialise a Pandas DataFrame using a list of dictionaries, simply pass it to the DataFrame constructor.
- Inserting lists into a Pandas DataFrame cell
  To insert a list into a Pandas DataFrame cell, ensure the type of the column is object, and then use at or iat property to insert the list.
- Keeping leading zeroes when using read_csv in Pandas
  To keep leading zeros for read_csv(~) in Pandas, specify the type as string for that column using the dtype parameter.
- Parsing dates when using read_csv in Pandas
  To parse datas when using read_csv(~) in Pandas, supply the parse_dates parameter.
- Preventing strings from getting parsed as NaN for read_csv in Pandas
  To prevent strings from getting parsed as NaN when using read_csv(~) in Pandas, set the keep_default_na=False parameter.
- Reading URL using read_csv in Pandas
  To read a dataset that resides in some URL in Pandas, directly pass the URL into read_csv(~).
- Reading data from GitHub in Pandas
  To read data from a GitHub page in Pandas, supply the URL to the read_csv(~) method.
- Reading file without header in Pandas
  To read a file without header into a Pandas DataFrame when using read_csv(~), set header=None.
- Reading large CSV files in chunks in Pandas
  To read large CSV files in chunks in Pandas, use the read_csv(~) method and specify the chunksize parameter. This is particularly useful if you are facing a MemoryError when trying to read in the whole DataFrame at once.
- Reading n random lines using read_csv in Pandas
  To read n random lines using read_csv(~) in Pandas, first get the number of lines of the file, and then use the Random.sample(~) method to get the row numbers to skip.
- Reading space-delimited files in Pandas
  To read space-separated files in Pandas, use read_csv(~) with either parameters sep=" " or delim_whitespace=True.
- Reading specific columns from file in Pandas
  To read specific columns from a file in Pandas, we can use the read_csv(~) method and specify the usecols parameter.
- Reading tab-delimited files in Pandas
  To read a tab-delimited file using read_csv(~) in Pandas, specify the parameter sep='\t'.
- Reading the first few lines of a file to create DataFrame in Pandas
  To read the first n lines of a file to create a Pandas DataFrame when using read_csv(~), set the parameter nrows.
- Reading the last n lines of a file in Pandas
  To read only the last n lines of a file into a Pandas DataFrame for read_csv(~), first fetch the total number of lines in the file, and then use the skiprows parameter.
- Reading zipped csv file as a Pandas DataFrame
  To read a zipped csv file as a Pandas Frame, use Pandas' read_csv(~) method.
- Removing Unnamed:0 column in Pandas
  We can get an unwanted column named Unnamed:0 when creating a DataFrame from a csv file using the read_csv(~) method. To create a DataFrame without Unnamed:0, we can pass index_col=0 to our read_csv(~) call.
- Resolving ParserError: Error tokenizing data in Pandas
  Common reasons for ParserError: Error tokenizing data when initiating a Pandas DataFrame include: - Using the wrong delimiter - Number of fields in certain rows do not match with header To resolve the error we can try the following: - Specifying the delimiter through sep parameter in read_csv(~) - Fixing the original source file - Skipping bad rows
- Saving Pandas DataFrame as zipped csv
  To save a Pandas DataFrame as a zipped csv, use the to_csv(~) method with compression="gzip".
- Skipping rows without skipping header for read_csv in Pandas
  To skip rows whilst keeping the column labels when using read_csv(~) in Pandas DataFrame, set the skiprows parameter.
- Specifying data type for read_csv in Pandas
  To specify a data type for the columns when using read_csv(~) in Pandas, pass a dictionary into the dtype parameter, where the key is the column name and the value is the desired data type for that column.
- Treating missing values as empty strings rather than NaN for read_csv in Pandas
  To parse as an empty string instead of a NaN when using read_csv(~) in Pandas, set the parameter keep_default_na=False.
Data Aggregation Cookbook
- Applying a function to multiple columns in groups in Pandas
  To group by group, and then apply a function on multiple columns in each group in Pandas, use the apply(~) method after groupby(~).
- Calculating percentiles of a DataFrame in Pandas
  To calculate percentiles in Pandas, use the quantile(~) method.
- Calculating the percentage of each value in each group in Pandas
  To compute the percentage of each value in each distinct group in Pandas, call the groupby(~) method and then pass in the following function lambda my_df: my_df / my_df.sum(). .
- Computing descriptive statistics of each group in Pandas
  To get the descriptive statistic of each group in Pandas, call the describe() method after groupby(~).
- Difference between a group's count and size in Pandas
  The difference between a group's count() and size() in Pandas groupby(~) is that count() returns the number of non-nan values for each column, and size() returns the length, that is, the number of rows of a group.
- Difference between methods apply and transform for groupby in Pandas
  The differences between methods apply and transform for groupby in Pandas are that, the function passed in apply takes in as argument a DataFrame representing each group while that in transform takes in a Series representing a column of each group.
- Getting cumulative sum of each group in Pandas DataFrame
  To compute the cumulative sum of each group in Pandas DataFrame, call df.groupby('name')['score'].cumsum().
- Getting descriptive statistics of DataFrame in Pandas
  To get the descriptive statistics of DataFrame in Pandas, use the DataFrame's describe(~) method.
- Getting multiple aggregates of a column after grouping in Pandas
  To get multiple aggregates of a column after grouping by groupby(~) in Pandas, call agg(~) method using either a list of aggregate functions or keyword arguments, where the keyword becomes the resulting column label.
- Getting n rows with smallest column value in each group in Pandas
  To get the n rows with smallest column value in each group in Pandas, first sort the DataFrame in ascending order by sort-values(~), and then perform a groupby(~) followed by head(n).
- Getting number of distinct rows in each group in Pandas DataFrame
  To get the number of distinct scores in each group in Pandas DataFrame, use 'nunique' with groupby(~) and agg(~).
- Getting size of each group in Pandas
  To get the size of each group in Pandas, use the groups' size() method.
- Getting specific group after groupby in Pandas
  To get a specific group after calling groupby(~) in Pandas, use the get_group(~) method, which returns a DataFrame.
- Getting the first row of each group in Pandas
  To get the first row of each group in Pandas, call first() after calling groupby(~).
- Getting the last row of each group in Pandas
  To get the last row of each group in Pandas, call last() after calling groupby(~).
- Getting the top n rows with largest column value in each group in Pandas
  To get the top n rows with the largest column value in each group in Pandas, use the DataFrame's sort_values method, then group by the column, and finally use the head function to fetch the top n rows.
- Getting unique values of each group in Pandas
  To get the number of unique values in each group in Pandas, use nunique(~) after calling groupby(~).
- Grouping by multiple columns in Pandas
  To group by multiple columns in Pandas, pass in an array of column labels to groupby(~).
- Grouping without turning group column into index in Pandas
  To perform grouping without turning the group column into an Index in Pandas, set as_index=false for groupby(~) method.
- Merging rows within a group together in Pandas
  To merge rows within a group together in Pandas we can use the agg(~) method together with the join(~) method to concatenate the row values.
- Naming columns after aggregation in Pandas DataFrame
  To name columns after aggregation in Pandas DataFrame, use named arguments in the agg(~) method.
- Sorting values within groups in Pandas
  To sort values within groups in Pandas, first sort the DataFrame by sort_values(~), and then use the groupby(~) method.
Data Manipulation Cookbook
- Adding a prefix to column values in Pandas
  To add a prefix to column values in Pandas DataFrame, directly use the + operator to concatenate a string to the column values (broadcasting), or use the Series' str.pad(~) method.
- Adding leading zeros to strings of a column in Pandas
  To add leading zeros to strings of a column in Pandas DataFrame, use the Series' str.zfill(~) method.
- Adding new column using lists in Pandas DataFrame
  To add a single column to a DataFrame, use the square-brackets syntax directly. To add multiple columns, use Pandas concat(~) method.
- Adding padding to a column of strings in Pandas
  To add padding to a column of strings in Pandas DataFrame, use the Series' str.pad(~) method.
- Bit-wise OR in NumPy and Pandas
  The pipe character (|) is used to perform a bit-wise OR operation in NumPy and Pandas.
- Changing column type to string in Pandas DataFrame
  To change a column type to string in Pandas DataFrame, use astype("string").
- Conditionally updating values of a DataFrame in Pandas
  To conditionally update values in a Pandas DataFrame, create a boolean mask and then pass it into loc, and finally perform assignment.
- Converting K and M to numerical form in Pandas DataFrame
  To convert "K" (thousand) and "M" (million) to numerical form in Pandas DataFrame, use df["A"].replace({"K":"*1e3", "M":"*1e6"}, regex=True).map(pd.eval).astype(int).
- Converting all object-typed columns to categorical type in Pandas DataFrame
  To convert all object-typed columns to categorical type in Pandas DataFrame, first obtain a list of column labels where the column is of type object, and then iterate over this list to perform the conversion to categorical.
- Converting column type to date in Pandas DataFrame
  To convert the column type from object or string to datetime in Pandas DataFrame, use the pd.to_datetime(~) method.
- Converting column type to float in Pandas DataFrame
  To convert the column type to float in Pandas DataFrame, either use the Series' astype() method, or use Pandas' to_numeric() method.
- Converting column type to integer in Pandas DataFrame
  To convert the column type to integer in Pandas DataFrame, either use the Series' astype() method or use Pandas' to_numeric() method.
- Converting string categories or labels to numeric values in Pandas
  To encode the string labels or categories with numeric integers in Pandas, use the codes property of pd.Categorical(~).
- Encoding categorical variables in Pandas
  To encode categorical variables, either using one-hot encoding or dummy coding, use Pandas get_dummies(~) method.
- Expanding lists vertically in a DataFrame in Pandas
  To expand lists vertically in a Pandas DataFrame, use the DataFrame's explode(~) method.
- Expanding strings vertically in a DataFrame in Pandas
  To expand strings vertically in Pandas, use the DataFrame's explode(~) method.
- Extracting numbers from column in Pandas DataFrame
  To extract numbers from column values in Pandas DataFrame, use str.extract('(\d+)').
- Filling missing value in Index of Pandas DataFrame
  To fill missing entries in the index of a Pandas DataFrame, use the reindex(~) method.
- Filtering column values using boolean masks in Pandas DataFrame
  To filter column values using boolean masks in Pandas DataFrame, use the Series' loc property.
- Logical AND operation in Pandas DataFrame
  Use & to perform a logical AND operation in Pandas DataFrame.
- Making DataFrame string column lowercase in Python
  We can making a DataFrame string column lowercase in Python using the str.lower() method.
- Mapping True and False to 1 and 0 respectively in Pandas DataFrame
  To map booleans True and False to 1 and 0 respectively in Pandas DataFrame, perform casting using astype(int).
- Mapping values of a DataFrame using a dictionary in Pandas
  To map values of a Pandas DataFrame using a dictionary, use the DataFrame's replace(~) method.
- Modifying a single value in a Pandas DataFrame
  To modify a single value in a Pandas DataFrame, use either iloc when using integer indices, or loc when using row and column labels.
- Removing characters from columns in Pandas DataFrame
  To remove characters from columns in Pandas DataFrame, use the replace(~) method.
- Removing comma from column values in Pandas DataFrame
  To remove comma from column values in Pandas DataFrame, use the Series' str.replace(~) method.
- Removing first n characters from column values in Pandas DataFrame
  To remove the first n characters from column values from this Pandas DataFrame, use the vectorised str slicing approach.
- Removing last n characters from column values in Pandas DataFrame
  To remove the last n characters from values from column A in Pandas DataFrame, use df["A"].str[:-1].
- Removing leading substring in Pandas DataFrame
  To remove the leading substring 'ab', use the Series' str.replace(~) method.
- Removing trailing substring in Pandas DataFrame
  To remove the trailing substring 'ab', use the Series' str.replace(~) method.
- Replacing infinities with another value in Pandas DataFrame
  To replace infinities (np.inf) with another value in a Pandas DataFrame, use the replace(~) method.
- Replacing values in a DataFrame in Pandas
  To replace values in a Pandas DataFrame, use the DataFrame's replace(~) method.
- Rounding values in Pandas
  To round values in Pandas use the DataFrame.round(~) method. The method takes a single parameter which specifies the number of decimals to round to.
- Sorting categorical columns in Pandas DataFrame
  To sort the DataFrame based on a categorical column, we first convert the column into a categorical type, and set the ordering using the second parameter. We then use sort_values(~) to perform the sort.
- Using previous row to create new columns in Pandas DataFrame
  To use previous rows to create new columns, first make a Series using Pandas' shift(~) method, and then perform computation using this Series.
Handling Missing Values
- Adding missing dates in Datetime Index in Pandas DataFrame
  To add the missing dates in DatetimeIndex, replace the index with a new index using reindex(~).
- Checking if a DataFrame contains any missing values in Pandas
  To check if a Pandas DataFrame contains any missing values, use df.isna().any(axis=None).
- Checking if a certain value in a DataFrame is missing (NaN) in Pandas
  To check if a certain value in a Pandas DataFrame is missing (NaN), use the methods isna() and at.
- Converting a Pandas column with missing values to integer type
  To convert column A to integer type in Pandas DataFrame, use df['A'].astype('Int64').
- Counting non-missing values in Pandas
  To count non-missing values in rows or columns of a Pandas DataFrame use the count(~) method.
- Counting number of missing values (NaN) in each column of a Pandas DataFrame
  To count the number of missing values (NaNs) of each column in a Pandas DataFrame, use df.isna().sum().
- Counting number of rows with missing values in Pandas DataFrame
  To count the number of rows that contain at least one missing value in Pandas DataFrame, use df.isna().any(axis=1).sum(). To count the number of rows with all missing values in Pandas DataFrame, use df.isna().all(axis=1).sum().
- Counting the number of missing values (NaNs) in each row of a Pandas DataFrame
  To count the number of missing values (NaNs) in each row of a Pandas DataFrame, use df.isna().sum(axis=1).
- Counting the total number of missing values (NaNs) of a Pandas DataFrame
  To count the total number of missing values (NaNs) in a Pandas DataFrame, use df.isna().values.sum().
- Filling missing values using another column values in Pandas DataFrame
  To fill the missing values in column A using values in column B in Pandas DataFrame, use df.loc[df["A"].isnull(), "A"] = df["B"].
- Filling missing values (NaNs) with the mean of the column in Pandas DataFrame
  To fill missing values (NaNs) with the mean of the column in Pandas DataFrame, use df.fillna(df.mean()).
- Finding columns with missing values (NaNs) in Pandas DataFrame
  To find columns with at least one NaN in a Pandas DataFrame, use df.isna().any(). To find columns that contain only NaN, use df.isna().all().
- Getting index of rows with missing values (NaNs) in Pandas DataFrame
  To get the index of rows with missing values in Pandas DataFrame, use temp = df.isna().any(axis=1), and then temp[temp].index.
- Getting index of rows without missing values in Pandas DataFrame
  To get the index of rows without missing values in a Pandas DataFrame, use df.dropna().index.
- Getting integer indexes of rows with NaN in Pandas DataFrame
  To get the integer indexes of rows with missing values (NaN) in Pandas DataFrame, use either the isna(~) method or all(~) method along with NumPy's where(~) method.
- Getting rows with missing values (NaNs) in Pandas DataFrame
  To get rows with missing values (NaNs) in a Pandas DataFrame, use df[df.isna().any(axis=1)].
- Getting rows with missing values (NaNs) in certain columns in Pandas DataFrame
  To get rows with missing values in a specific column in Pandas DataFrame, use df[df[column_name].isna()].
- Mapping NaN values to 0 and non-NaN values to 1 in Pandas DataFrame
  To map NaN values to 0 and non-NaN values to 1 in Pandas DataFrame, use the methods notnull() and astype("int").
- Mapping NaN values to False and non-NaN values to True in Pandas DataFrame
  To map NaN values to False and non-NaN values to True in Pandas DataFrame, use the notnull() method.
- Removing columns where some rows contain missing values (NaNs) in Pandas DataFrame
  To remove columns where some rows contain missing values (NaN), use the Pandas DataFrame's dropna(~) method.
- Removing rows from a DataFrame with missing values (NaNs) in Pandas
  To remove rows from a Pandas DataFrame whose value for a specific column is missing (NaN), use the DataFrame's dropna(~) method.
- Replacing NaN with blank string in Pandas DataFrame
  To replace missing values (NaN) with a blank string in Pandas, use the DataFrame's fillna("") method.
- Replacing missing values (NaNs) for certain columns in Pandas DataFrame
  To replace missing values (NaNs) present in certain columns, use the Pandas DataFrame's fillna(~) method.
- Replacing missing values (NaNs) with preceding values in Pandas DataFrame
  To replace missing values (NaNs) with preceding values, use the Pandas DataFrame's fillna(method="ffill") method.
- Replacing all missing values (NaNs) of a Pandas DataFrame
  To replace all missing values (NaNs) in a Pandas DataFrame, use the DataFrame's fillna(~) method.
- Replacing all NaN values with zeros in a Pandas DataFrame
  To replace all missing values (NaN) with zeros in a Pandas DataFrame, use the fillna(~) method.
- Replacing missing values in Pandas DataFrame
  To replace missing values (NaN) in Pandas DataFrame, use the fillna(~) method.
- Replacing missing values with constants in Pandas DataFrame
  To replace missing values with a constant in Pandas DataFrame, use the fillna(~) method.
- Replacing values with NaNs in Pandas DataFrame
  To replace values with NaNs, use the Pandas DataFrame's replace(~) method.
- Using interpolation to fill missing values (NaNs) in Pandas DataFrame
  To fill missing values using interpolation in Pandas, use the DataFrame's interpolate(~) method.
Miscellaneous Cookbook
- Adjusting number of rows that are printed in Pandas DataFrame
  To adjust the number of rows of a DataFrame that are printed in Pandas, use pd.set_option('display.max_rows', n) where n is the number of rows you want to show.
- Appending DataFrame to an existing CSV file in Pandas
  To append a DataFrame to an existing CSV file in Pandas, use the to_csv(~, mode="a") method.
- Checking differences between two indexes in Pandas
  To check the differences between two indexes in Pandas use the Index.difference(~) method.
- Checking if a DataFrame is empty in Pandas
  To check if a Pandas DataFrame is empty, use the DataFrame's empty property. An empty DataFrame is defined as those with no values in them.
- Checking if a variable is a DataFrame in Pandas
  To check if a variable is a DataFrame in Pandas, use the built-in isinstance(~) method.
- Checking if index is sorted in Pandas
  To check if the index of a DataFrame is sorted in ascending order use the is_monotonic_increasing property. Similarly, to check for descending order use the is_monotonic_decreasing property.
- Checking if value exists in Index in Pandas DataFrame
  To check if a value exists in the Index of a Pandas DataFrame, use the in keyword on the index property,
- Checking memory usage of DataFrame in Pandas
  To check the memory usage of a DataFrame in Pandas we can use the info(~) method or memory_usage(~) method. The info(~) method shows the memory usage of the whole DataFrame, while the memory_usage(~) method shows memory usage by each column of the DataFrame.
- Checking whether a Pandas object is a view or a copy
  To check whether a Pandas object is a view or a copy, use the _is_view property.
- Concatenating a list of DataFrames in Pandas
  To concatenate a list of DataFrames in Pandas either vertically or horizontally, use the concat(~) method.
- Converting DataFrame to a list of dictionaries in Pandas
  To convert a DataFrame to a list of dictionaries in Pandas, use the DataFrame's to_dict(orient="records") method.
- Converting DataFrame to list of tuples in Pandas
  To convert a DataFrame df into a list of tuples in Pandas, use list(df.itertuples(index=False)).
- Converting a DataFrame to a Series in Pandas
  To convert a DataFrame into a Series, use the squeeze() method, which reduces a DataFrame with a single row or column to a Series.
- Converting a DataFrame to a list in Pandas
  To convert a Pandas DataFrame into a 2D standard Python list, use the values property followed by the tolist method.
- Counting the number of negative values in Pandas DataFrame
  To count the total number of negative values in Pandas DataFrame df, call (df < 0).sum().sum().
- Creating a Pandas DataFrame using cartesian product of two DataFrames
  To create a new Pandas DataFrame using the cartesian product of two DataFrames, use the method merge(~) with the parameter how='cross'.
- Displaying DataFrames side by side in Pandas
  To print DataFrames side-by-side in Pandas, set the inline option when calling display_html(~).
- Displaying full non-truncated DataFrame values in Pandas
  To display full non-truncated DataFrame values use the pd.set_option(~) method.
- Drawing frequency histogram of Pandas DataFrame column
  To draw a frequency histogram of a Pandas DataFrame, use the plt.hist of the matplotlib library.
- Exporting Pandas DataFrame to PostgreSQL table
  To connect with the PostgreSQL database, we must use the create_engine(~) method of the sqlalchemy library, and then use Pandas DataFrame's to_sql(~) method.
- Highlighting Pandas DataFrame cell based on value in Jupyter Notebook
  To highlight a Pandas DataFrame cell based on value in Jupyter Notebook, use df.style.applymap(highlighter).
- Highlighting a particular cell of a DataFrame in Pandas
  To highlight a particular cell of a Pandas DataFrame, use the DataFrame's style.apply(~) method.
- How to solve "ValueError: If using all scalar values, you must pass an index" in Pandas
  To solve "ValueError: If using all scalar values, you must pass an index" in Pandas, either pass a list of values instead, or define an index.
- Importing BigQuery table as Pandas DataFrame
  To import a BigQuery table as a DataFrame, Pandas offer a built-in method called read_gbq that takes in as argument a query string as well as a path to the JSON credential file for authentication.
- Plotting two columns of Pandas DataFrame
  To plot two columns of Pandas DataFrame, import matplotlib and extract the columns as a Series.
- Printing DataFrame on a single line in Pandas
  To print a DataFrame on a single line instead of across multiple lines use the pd.set_option(~) method setting the 'expand_frame_repr' setting to False.
- Printing DataFrame without index in Pandas
  To print a DataFrame without the index in Pandas, use the to_string(~) method.
- Printing DataFrames in tabular format in Pandas
  To print DataFrames in tabular (table) format in Pandas, import the tabulate library and use the tabulate method.
- Randomly splitting DataFrame into multiple DataFrames of equal size in Pandas
  To randomly split a DataFrame into multiple DataFrames of equal size in Pandas, shuffle the rows using sample(~) method and then call NumPy's arraysplit(~) method.
- Reducing DataFrame memory size in Pandas
  There are two main ways to reduce DataFrame memory size in Pandas without necessarily compromising the information contained within the DataFrame: Use smaller numeric types Convert object columns to categorical columns
- Saving Pandas DataFrame as Excel file
  To save a Pandas DataFrame as an Excel file, use the DataFrame's to_excel(~) method.
- Saving Pandas DataFrame as feather file
  To save the DataFrame as a feather file, use df.to_feather("file_name.feather").
- Saving a Pandas DataFrame as a CSV file
  To save a Pandas DataFrame as a CSV file, use the DataFrame's to_csv(~) method.
- Setting all values to zero in Pandas DataFrame
  To set all values to zero in Pandas DataFrame, use the iloc property like df.iloc[:] = 0.
- Showing all dtypes without truncation in Pandas DataFrame
  To show all dtypes without truncation in Pandas DataFrame, use with pd.option_context('display.max_rows', None).
- Splitting DataFrame into multiple DataFrames based on value in Pandas
  To split a DataFrame into multiple DataFrames based on values in a column in Pandas, perform a groupby(~) on the column, and then call the methods tuple(~) and dict(~).
- Splitting DataFrame into smaller equal-sized Pandas DataFrames
  To split this Pandas DataFrame into smaller equal-sized DataFrames, use NumPy's array_split(~) method.
- Writing Pandas DataFrame to SQLite
  To write a Pandas DataFrame to SQLite, use the sqlite3 library and use the pd.to_sql(~) method.
Multi-index Operations Cookbook
- Combining multiple DataFrames into one DataFrame in Pandas
  To combine multiple DataFrames into a single DataFrame, use the pd.concat(~) method.
- Resetting MultiIndex of a DataFrame in Pandas
  To reset the multi-index of a DataFrame in Pandas, use the DataFrame's reset_index() method.
- Setting multi-index using two columns of a DataFrame in Pandas
  To set a multi-index to a DataFrame using two of its columns, use the DataFrame's set_index(~) method.
- Sorting a multi-index DataFrame in Pandas
  To sort a multi-index DataFrame, use the DataFrame's sort_index(~) method.
Row and Column Operations Cookbook
- Adding a column that contains the difference of consecutive rows in Pandas DataFrame
  To add a column that contains the difference of consecutive rows in Pandas DataFrame, use the diff(~) method.
- Adding a constant number to DataFrame columns in Pandas
  We can add a constant number to DataFrame columns in Pandas using the + operator or the add(~) method.
- Adding an empty column to a DataFrame in Pandas
  To add an empty column to a Pandas DataFrame, use the DataFrame's assign(~) method.
- Adding column to DataFrame with constant values in Pandas
  To add a column of constants in Pandas DataFrame, directly use the square bracket notation, [].
- Adding new columns to a DataFrame in Pandas
  To add new columns to a DataFrame in Pandas, use the DataFrame's assign(~) method.
- Appending rows to a Pandas DataFrame
  To append a single row (a list or Series) or multiple rows to a Pandas DataFrame, use the DataFrame's append(~) method.
- Applying a function that takes as input multiple column values in Pandas
  To apply a function that takes as input multiple column values in Pandas, use the DataFrame's apply(~) method.
- Applying a function to a single column of a DataFrame in Pandas
  To apply a function to a single column in Pandas DataFrame, use [] syntax rather than using the apply method.
- Changing column type to categorical in Pandas
  To change column type to categorical in Pandas, use the DataFrame's astype("category") method.
- Changing the name of a DataFrame's index in Pandas
  To change the label of a specific index, use the Pandas DataFrame's rename method, or the index property.
- Changing the order of columns in a Pandas DataFrame
  To change the order of two columns in Pandas DataFrame, either use [] syntax like df = df[["B","A","C"]], or use the reindex method.
- Changing the type of a DataFrame's column in Pandas
  To change the data type of a DataFrame's column in Pandas, use the Series' astype(~) method.
- Changing the type of a DataFrame's index in Pandas
  To change the type of a DataFrame's index in Pandas, use the DataFrame.index.astype(~) method.
- Checking if a DataFrame column contains some values in Pandas
  To check if a DataFrame column contains some values in Pandas, chain the methods isin(~) and any(~).
- Checking if a column exists in a DataFrame in Pandas
  To check if column A, exists in a DataFrame df in Pandas, use "A" in df.columns.
- Checking if a value exists in a DataFrame in Pandas
  To check if a value exists in the Pandas DataFrame, use the built-in in operator against the DataFrame's values property.
- Checking if column is numeric in Pandas DataFrame
  To check if a column is numeric in a Pandas DataFrame, use df['A'].dtype.kind in 'iufc'.
- Checking the data type of columns in a Pandas DataFrame
  To check the data type of columns in a Pandas DataFrame, use the DataFrame's dtypes property.
- Checking whether column values match or contain a pattern in Pandas DataFrame
  To check whether column values match or contain a pattern in Pandas DataFrame, use the Series' str.contain(~) method.
- Combining two columns as a single column of tuples in Pandas
  To combine two columns as a single column of tuples in Pandas, use the DataFrame's apply(tuple) method.
- Combining two columns of type string in a Pandas DataFrame
  To combine columns A and B of type string in Pandas DataFrame to form a new column C, use df["C"] = df["A"] df["B"].
- Computing the average of columns in Pandas DataFrame
  To compute the average of columns in Pandas DataFrame, use the mean(~) method.
- Computing the correlation between columns in Pandas DataFrame
  To compute the correlation between columns in Pandas DataFrame, use the corr(~) method.
- Concatenating DataFrames horizontally in Pandas
  To concatenate DataFrames horizontally in Pandas, use the concat(~) method with axis=1.
- Concatenating DataFrames vertically in Pandas
  To concatenate DataFrames vertically in Pandas, use the concat(~) method.
- Converting Index to list in Pandas
  To convert an Index to a Python list in Pandas, use Index's tolist() method.
- Converting a row to column labels in Pandas
  To convert a row to column labels in Pandas DataFrame, directly assign the DataFrame's columns property to the reference of the row (Series).
- Converting categorical type to int in Pandas DataFrame
  To change the column type from category to int in Pandas DataFrame, use the factorize(~) method.
- Converting column to list in Pandas
  To convert a column into a Python list in Pandas, either use Series' to_list(~) method or use Python's built-in list(~) method.
- Converting percent strings into numeric in Pandas DataFrame
  To convert percent strings into numeric type in Pandas DataFrame, first strip the trailing % character and then perform type conversion using astype(float).
- Converting the index of a DataFrame into a column
  To convert the index of a Pandas DataFrame into a column, use the DataFrame's reset_index() method.
- Counting duplicate rows in Pandas DataFrame
  To count the number of duplicate rows in a Pandas DataFrame, use the DataFrame's duplicated(~) method.
- Counting number of rows with no missing values in Pandas DataFrame
  To count the number of rows with no missing values in a Pandas DataFrame, use the combination of methods notna, all and sum.
- Counting the occurrence of values in columns of a Pandas DataFrame
  To count the occurrence of a specific value in a column of a Pandas DataFrame, first obtain a boolean mask and then use the sum method to add up all the boolean Trues.
- Counting unique values in a column of a Pandas DataFrame
  To count the number of unique values in a column of a DataFrame in Pandas, use the nunique(~) method.
- Counting unique values in rows of a Pandas DataFrame
  To count the number of unique values in a row of a DataFrame in Pandas, use the nunique(~) method.
- Creating a new column based on other columns in Pandas DataFrame
  To create a new column based on other columns for Pandas DataFrame, either use column-arithmetics for fastest performance or use assign method for complicated operations.
- Creating new column using if, elif and else in Pandas DataFrame
  To create new columns using if, elif and else in Pandas DataFrame, use either the apply method or the loc property.
- Describing certain columns of a DataFrame in Pandas
  To describe certain columns, as opposed to all columns, in Pandas DataFrame, use the [] notation to first extract the desired columns and then use the describe(~) method.
- Dropping columns whose label contains a substring in Pandas
  To drop columns whose label contains a specific substring in Pandas DataFrame, use df.loc[:, ~df.columns.str.contains(substring)].
- Getting column values based on another column values in a DataFrame in Pandas
  To get column values based on another column values in Pandas DataFrame, use the query(~) method and then extract the desired columns.
- Getting columns as a copy in Pandas DataFrame
  To get columns as a copy in Pandas DataFrame, use the copy(~) method.
- Getting columns whose label contains a substring in Pandas
  To get columns whose label contains a substring in Pandas, use the DataFrame's filter(~) method.
- Getting maximum value in columns of Pandas DataFrame
  To get the maximum value in columns in Pandas DataFrame, use the max(~) method.
- Getting maximum value of entire DataFrame in Pandas
  To get the maximum value of entire DataFrame df in Pandas, use df.max().max().
- Getting mean of columns in Pandas DataFrame
  To get the mean of columns in Pandas DataFrame, use the mean(~) method.
- Getting median of columns in Pandas DataFrame
  To get the median of columns in Pandas DataFrame, use the median(~) method.
- Getting minimum value in columns in Pandas DataFrame
  To get the minimum value of columns in Pandas DataFrame, use the min() method.
- Getting row label when calling apply in Pandas
  To get the row label when calling apply in Pandas, use the name property of the row.
- Getting row labels as list in Pandas DataFrame
  To get the row labels as a list in Pandas, use list(df.index) where df is the DataFrame.
- Getting rows where column value contains any substring in a list in Pandas DataFrame
  To get all rows where the string in a column contains certain substrings in Pandas DataFrame, construct a regular expression using the pipeline character, and then use the str.contains(~) method.
- Getting the name of a DataFrame's index in Pandas
  To get the name of a DataFrame's index in Pandas, use DataFrame.index.name.
- Getting the type of DataFrame's index
  To get the type of the Pandas DataFrame's index, use DataFrame.index.dtype.
- Grouping DataFrame rows into lists in Pandas
  To pack the values in one column (A) into a list for each group present in another column (B) in Pandas DataFrame, use df.groupby('A')["B"].agg(list), where df is the source DataFrame.
- Inserting column at a specific location in Pandas
  To insert a column at a specific location in Pandas, use the DataFrame's insert(~) method.
- Iterating over each column of a DataFrame in Pandas
  To iterate over each column of a DataFrame in Pandas, use the DataFrame's iteritems() method, which returns an iterator over the column labels and column values.
- Iterating over each row of a DataFrame in Pandas
  To iterate over each row of a DataFrame in Pandas, use the DataFrame's iterrows() method, which returns an iterator over the row labels and row values.
- Modifying rows of a Pandas DataFrame
  To modify a row of a Pandas DataFrame, either use the DataFrame's loc or iloc property.
- Modifying values in Index of Pandas DataFrame
  To modify values in the Index of a Pandas DataFrame, use the rename(~) method or perform direct assignment.
- Removing columns from a DataFrame in Pandas
  To remove a column from a DataFrame in Pandas, use the DataFrame's drop(~) method.
- Removing columns using column labels in Pandas DataFrame
  To remove columns using column labels in Pandas DataFrame, use the drop(~) method.
- Removing columns using integer index in Pandas DataFrame
  To drop columns using integer index in Pandas DataFrame, call the drop(~) method.
- Removing columns with all missing values in Pandas DataFrame
  To remove columns with all missing values in Pandas, use the DataFrame's dropna(how="all", axis=1) method.
- Removing columns with some missing values in Pandas DataFrame
  To remove columns with some missing values in Pandas DataFrame, use the dropna(axis=1) method.
- Removing duplicate columns in Pandas DataFrame
  To remove duplicate columns in Pandas DataFrame, use the duplicated(~) method.
- Removing duplicate rows in Pandas DataFrame
  To remove duplicate rows from a Pandas DataFrame, use the drop_duplicates(~) method.
- Removing first n rows of a DataFrame in Pandas
  To remove the first n rows of a DataFrame in Pandas, use the iloc property.
- Removing multiple columns in Pandas DataFrame
  To remove multiple columns in Pandas DataFrame, use the drop(~) method.
- Removing prefix from column labels in Pandas DataFrame
  To remove prefix from column labels in Pandas DataFrame, use the str.lstrip(~) method.
- Removing rows at random without shuffling in Pandas DataFrame
  To remove rows at random without shuffling in Pandas DataFrame, first get an array of randomly selected row index labels, and then use the drop(~) method to remove the rows.
- Removing rows from a DataFrame based on column values in Pandas
  To remove rows from a Pandas DataFrame based on column values, use the DataFrame's query(~) method.
- Removing rows using integer index in Pandas DataFrame
  To remove rows using integer index in Pandas DataFrame, first get the name of the row index using iloc, and second use the drop(~) method to remove the row.
- Removing rows with all zeros in Pandas DataFrame
  To remove rows with all zeros in Pandas DataFrame, use df[~(df == 0).all(axis=1)] where df is the DataFrame.
- Removing suffix from column labels in Pandas DataFrame
  To remove suffix from column labels in Pandas DataFrame, use the str.rstrip(~) method.
- Renaming columns of a DataFrame in Pandas
  To rename the columns of a DataFrame in Pandas, use either the rename(~) method or the columns property.
- Replacing substring in column values in Pandas DataFrame
  To replace substrings in column values in Pandas DataFrame, use the Series' str.replace(~) method.
- Returning multiple columns using the apply function in Pandas
  To return multiple columns using the apply(~) function in Pandas, make the parameter function return a Series.
- Reversing the order of rows in Pandas DataFrame
  To reverse the order of rows in Pandas DataFrame, use iloc[::-1].
- Setting a new index of a DataFrame in Pandas
  To replace the index of a DataFrame in Pandas, directly assign a new array to the index property or use the DataFrame's set_index method.
- Setting an existing column as the new index of a Pandas DataFrame
  To set an existing column as the new index of a Pandas DataFrame, use the set_index(~) method.
- Setting column as the index in Pandas DataFrame
  To set a column as the index of a Pandas DataFrame, use the set_index(~) method.
- Setting integers as column labels in Pandas DataFrame
  To set the column labels to be incremental integers in Pandas DataFrame, use df.columns = range(0, df.columns.size).
- Showing all column labels in Pandas DataFrame
  To show all column labels of a Pandas DataFrame, access the values property of columns.
- Shuffling the rows of a Pandas DataFrame
  To shuffle the rows of a Pandas DataFrame, use the DataFrame's sample(frac=1) method.
- Sorting Pandas DataFrame alphabetically
  To sort a Pandas DataFrame alphabetically, use sort_values(~).
- Sorting DataFrame by column labels in Pandas
  To sort a DataFrame by column labels in Pandas, use the DataFrame's sort_index(axis=1) method.
- Sorting a Pandas DataFrame by column
  To sort a Pandas DataFrame by column, use the DataFrame's sort_values(~) method.
- Sorting a DataFrame by index in Pandas
  To sort a DataFrame by index in Pandas, use the DataFrame's sort_index(~) method.
- Splitting a column of strings into multiple columns in Pandas
  To split a column of strings into multiple columns in Pandas DataFrame, use Series' split string method.
- Splitting column of lists into multiple columns in Pandas
  To split a column, which contains lists, into multiple columns, use pd.concat([df, df["A"].apply(pd.Series)], axis=1).
- Splitting dictionary into separate columns in Pandas DataFrame
  To split dictionaries into separate columns in Pandas DataFrame, use the apply(pd.Series) method.
- Stripping substrings from values in columns in Pandas
  To strip substrings from values in a column of a Pandas DataFrame, use the str.strip(~) helper method.
- Stripping whitespace from columns in Pandas
  To strip whitespace from columns in Pandas we can use the str.strip(~) method or the str.replace(~) method.
- Stripping whitespaces in column labels in Pandas DataFrame
  To strip whitespaces in column labels in Pandas DataFrame, use the str.strip() method.
- Summing a column of a DataFrame in Pandas
  To compute the sum of columns in Pandas, use the DataFrame's sum(~) method.
- Summing rows of specific columns in Pandas
  To sum rows of specific columns of a Pandas DataFrame, say columns A and C, call df["A"] + df["C"] where df is the DataFrame.
- Swapping the rows and columns of a DataFrame in Pandas
  To swap the rows and columns of a DataFrame in Pandas, use the DataFrame's transpose(~) method.
- Unstacking certain columns only in Pandas DataFrame
  Pandas' unstack(~) method does not allow you to select specific columns to unstack. To unstack certain columns only, use the Pandas melt(~) method.
- Updating a row while iterating over the rows of a DataFrame in Pandas
  To update a row while iterating over the rows of a DataFrame in Pandas, use the itertuples(~) method along with the at property for value assignment.
- Updating rows based on column values in Pandas DataFrame
  To update rows based on column values in Pandas DataFrame, use the loc property.
- Using apply method in parallel to Pandas DataFrame
  To run Pandas' apply(~) in parallel, use Dask, which is an easy-to-use library that performs Pandas' operations in parallel by splitting up the DataFrame into smaller partitions.
Selecting Data Cookbook
- Accessing a single value of a DataFrame in Pandas
  To access a single value of a DataFrame in Pandas, use the DataFrame.iloc property (via integer indices), or use the DataFrame.loc property (via row and column labels).
- Accessing columns of a DataFrame using column labels in Pandas
  To access specific columns of a Pandas DataFrame with their columns labels, directly use DataFrame[~] or use the DataFrame.loc property.
- Accessing columns of a DataFrame using integer indices in Pandas
  To access columns of a DataFrame using integer indices in Pandas, use the DataFrame.iloc property.
- Accessing rows of a DataFrame using integer indices in Pandas
  To access rows of a DataFrame using integer indices in Pandas, use DataFrame.iloc property.
- Accessing rows of a DataFrame using row labels in Pandas
  To access specific rows of a Pandas DataFrame with row labels, use the DataFrame.loc property.
- Accessing the first n rows of a Pandas DataFrame
  To access the first n rows of a Pandas DataFrame, use either the DataFrame's head(n) method or iloc property.
- Accessing the last n rows of a Pandas DataFrame
  To access the last n rows of a Pandas DataFrame, use the DataFrame's tail(n) method.
- Accessing values of a multi-index DataFrame in Pandas
  To access values of a multi-index DataFrame, use the loc property.
- Adding prefix to column labels in Pandas DataFame
  To prepend a prefix to column labels in Pandas, use the DataFrame's add_prefix(~) method.
- Adding suffix to column labels in Pandas DataFrame
  To append a suffix to column labels in Pandas, use the DataFrame's add_suffix(~) method.
- Converting two columns into a dictionary in Pandas DataFrame
  To convert two columns into a dictionary in Pandas DataFrame, first extract the two columns as Series, and then pass them into dict(zip(~)).
- Excluding columns based on type in Pandas DataFrame
  To exclude columns based on the data type in Pandas DataFrame, use the DataFrame's select_dtypes(~) method with the exclude parameter.
- Extracting values of a DataFrame as a Numpy array in Pandas
  To extract all values of a DataFrame as a Numpy array, use the DataFrame's values property or the to_numpy(~) method.
- Getting a list of all the column labels of a Pandas DataFrame
  To get the column labels of a Pandas DataFrame as a Python standard list, use list(df.columns).
- Getting all columns except one in Pandas DataFrame
  To get all columns except one in Pandas, use the DataFrame's drop(~) method.
- Getting all duplicate rows in Pandas
  To get all duplicate rows as a Pandas DataFrame, use the DataFrame's duplicated(~) method.
- Getting all numeric columns of a DataFrame in Pandas
  To get all numeric columns of a Pandas DataFrame, use the select_dtypes(~) method.
- Getting all unique values of columns in Pandas
  To get all unique values of certain columns in a DataFrame, use Pandas' unique(~) method.
- Getting column label of max value in each row in Pandas DataFrme
  To get the column label of the max value in each row of a Pandas DataFrame, use the idxmax(axis=1) method.
- Getting column label of minimum value in each row in Pandas DataFrame
  To get the column label of the minimum value in each row in Pandas DataFrame, use the idxmin(axis=1) method.
- Getting columns by data type in Pandas DataFrame
  To get columns by data type in Pandas, use the DataFrame's select_dtypes(~).
- Getting columns using integer index in Pandas DataFrame
  To get columns using integer index in Pandas DataFrame, use the iloc property.
- Getting earliest or latest date from Pandas DataFrame
  To get the earliest or latest date from Pandas DataFrame, use min(~) and max(~).
- Getting every nth row in Pandas DataFrame
  To get every nth row in Pandas DataFrame, use df.iloc[::n, :].
- Getting first row value of a column in Pandas DataFrame
  To get the first row value of a column in Pandas DataFrame, use the iloc property.
- Getting index of Series where value is True
  To get the index of values that equal True in Pandas, use s[s].index where s is a Series.
- Getting indexes of rows matching conditions in Pandas DataFrame
  To get the indexes of row matching a certain condition in Pandas DataFrame, use the query(~) method to perform filtering, and then fetch the corresponding index using index.
- Getting integer index of a column using its column label in Pandas
  To get the integer index of a column using its column label of a Pandas DataFrame, use the get_loc method of the Index object.
- Getting integer index of rows based on column values in Pandas DataFrame
  To get the integer indexes of rows based on column values in Pandas DataFrame, use NumPy's where(~) method.
- Getting multiple columns in Pandas DataFrame
  To get multiple columns in Pandas DataFrame, use either the [] syntax directly or use properties like loc and iloc.
- Getting number of columns of a Pandas DataFrame
  To get the total number of columns of a Pandas DataFrame, use the shape property, which returns a tuple containing the number of rows and columns.
- Getting row with largest index value in Pandas DataFrame
  To get rows with the largest index value in Pandas DataFrame df, use df.iloc[df.index.argmax()].
- Getting row with smallest index value in Pandas DataFrame
  To get rows with the smallest index value in Pandas DataFrame df, use df.iloc[df.index.argmin()].
- Getting rows based on multiple column values in Pandas DataFrame
  To get rows based on multiple column values in Pandas DataFrame, use the query(~) method.
- Getting rows except some in Pandas DataFrame
  To get all rows in a Pandas DataFrame except those at certain integer indexes or with certain labels, use the drop(~) method.
- Getting rows from a DataFrame based on column values in Pandas
  To get rows from a Pandas DataFrame based on column values, use the DataFrame's query(~) method.
- Getting rows that are not in other DataFrame in Pandas
  To get rows from a DataFrame that are not in another DataFrame in Pandas, perform a left join on all the columns using merge(~), and then use the query(~) method to extract the non-matched rows.
- Getting rows using OR statement in Pandas DataFrame
  To get rows using OR statement in Pandas DataFrame, use the query(~) method.
- Getting rows where column values are of specific length in Pandas DataFrame
  To get rows where column values have a specific length (n) in Pandas DataFrame, use df.query("A.str.len() == 3", engine="python").
- Getting rows where value is between two values in Pandas DataFrame
  To get rows where value is between two values in Pandas DataFrame, use the query(~) method.
- Getting rows where values do not contain substring in Pandas DataFrame
  To get rows where values do not contain a substring in Pandas DataFrame, use str.contains(~) with the negation operator ~.
- Getting shortest and longest strings in Pandas DataFrame
  To get the shortest or longest strings in Pandas DataFrame column, first compute the shortest or longest string length using Series.str.len() method, and then obtain a boolean mask to filter the shortest or longest strings.
- Getting the column labels of a DataFrame in Pandas
  To get the column labels of a DataFrame in Pandas, use the DataFrame.columns property.
- Getting the first column in Pandas DataFrame
  To get the first column in Pandas DataFrame, use the iloc property.
- Getting the index of a DataFrame in Pandas
  To get the index of a DataFrame in Pandas, use the DataFrame.index property.
- Getting the length of the longest string in a column in Pandas DataFrame
  To get the length of the longest string in a column (A) of a Pandas DataFrame (df), use df["A"].str.len().max().
- Getting the longest string in a column in Pandas DataFrame
  To get the longest string in a column in Pandas DataFrame, first get the length of each string using the str.len() method, and then use NumPy's where(~) method to get the integer indexes of the maximums. Finally, pass this integer indexes into iloc to get the desired rows.
- Getting the row with the maximum column value in Pandas DataFrame
  To get the row with the maximum column value in Pandas, use the DataFrame's nlargest(~) method.
- Getting the row with the minimum column value in Pandas DataFrame
  To get the row with the smallest column value in Pandas, use the DataFrame's nsmallest(~) method.
- Getting the shape of a DataFrame in Pandas
  To get the shape of a DataFrame in Pandas, use the DataFrame.shape property.
- Getting the total number of rows of a Pandas DataFrame
  To get the total number of rows of a Pandas DataFrame, use the shape property, which is a tuple containing the number of rows and columns.
- Getting the total number of values in a Pandas DataFrame
  To get the total number of values in a Pandas DataFrame, use the DataFrame's size property.
- Making column labels all lowercase in Pandas DataFrame
  To make column labels of a Pandas DataFrame all lowercase, use df.columns = df.columns.str.lower() where df is the DataFrame.
- Making column labels all uppercase in Pandas DataFrame
  To make column labels of a Pandas DataFrame all uppercase, use df.columns = df.columns.str.upper() where df is the DataFrame.
- Randomly select rows based on a condition from a Pandas DataFrame
  To randomly select rows based on a specific condition in Pandas, first use DataFrame.query(~) method to extract rows that meet the condition, and then use DataFrame.sample(~) method to randomly select n rows.
- Randomly selecting n columns from a DataFrame
  To randomly select n columns from a Pandas DataFrame, use the DataFrame's sample(~) method.
- Randomly selecting n rows from a DataFrame
  To select n rows from a DataFrame randomly, use the DataFrame's sample(~) method.
- Reassigning column values in Pandas DataFrame
  To reassign column values in Pandas DataFrame, use the [] syntax to get a view of the column, and then perform the assignment using =.
- Retrieving DataFrame column values as a NumPy array
  We can use the DataFrame.values property to return the values of a DataFrame column as a NumPy array.
- Selecting a single column as a Pandas DataFrame
  To get a column as a Pandas DataFrame instead of a Series, use df.loc[:,['A']].
- Selecting columns of a DataFrame using regex in Pandas
  We can select columns of a DataFrame using regex through the filter(~) method. The method applies filtering based on the labels of the columns/rows, and not on the actual data.
- Selecting columns that do not begin with certain prefix in Pandas DataFrame
  To select columns that do not begin with a certain prefix in Pandas DataFrame, first get the positions of the columns that do not begin with the specified prefix using columns.str.startwith(~), and then use iloc to select those columns.
- Selecting columns with certain prefix in Pandas DataFrame
  To select columns that begin with a certain prefix in Pandas DataFrame, first fetch the locations of the columns that start with a prefix using columns.str.startwith(~), and then use iloc to select those columns.
- Selecting last column of Pandas DataFrame
  To select the last column of Pandas DataFrame, use df[df.columns[-1]].
- Selecting n rows with the smallest values for a column in Pandas
  To select the n rows with the smallest values for a column in Pandas, use the DataFrame's nsmallest(~) method.
- Selecting rows based on a condition in Pandas
  To extract rows from a DataFrame using a boolean mask in Pandas, simply use the [~] notation like df[mask].
- Selecting rows based on dates in Pandas
  In Pandas, to select rows based on dates, use DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values are NOT contained in a list
  To select rows from a Pandas DataFrame whose column values are NOT contained in a list, use the DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values are contained in a list
  To select rows from a Pandas DataFrame whose column values are contained in a list, use the DataFrame's query(~) method.
- Selecting rows from a Pandas DataFrame whose column values contain a substring
  To select rows from a Pandas DataFrame where certain column values contain a specific substring, use the column's str.contains(~) method.
- Selecting rows starting with substring in Pandas DataFrame
  To select rows of a Pandas DataFrame starting with a specified substring we can use the str.startswith(~) method.
- Selecting top n rows with the largest values for a column in Pandas
  To select the top n rows with the largest values in a column in Pandas, use the DataFrame's nlargest(~) method.
- Splitting Pandas DataFrame based on column values
  To split a Pandas DataFrame based on column values, first build a mask of booleans that indicate rows where condition is satisfied. Next, use df[mask] and df[~mask] to obtain two separate DataFrames.
Time Series Cookbook
- Adding new column containing the difference between two date columns in Pandas DataFrame
  To add a new column containing the difference between two date columns in Pandas DataFrame, perform date arithmetics on the columns using the standard binary operators.
- Combining columns containing date and time in Pandas
  To combine columns containing date and time in Pandas DataFrame, use string concatenation and then use to_datetime(~) to convert the datetime string to type datetime64.
- Combining columns of years, months and days in Pandas
  To combine the Year, Month and Day columns to form another column in Pandas DataFrame, perform string concatenation and then use the to_datetime(~) method.
- Converting DatetimeIndex to Series of datetime in Pandas
  To convert DatetimeIndex into a Series of type datetime64 in Pandas, pass it into the Series constructor.
- Converting UNIX timestamp to datetime in Pandas
  To convert UNIX timestamp to datetime, use Pandas to_datetime(~) method.
- Converting a DataFrame column of strings to datetime in Pandas
  To convert a DataFrame column of strings to datetime, use Pandas to_datetime(~) method.
- Converting dates to strings in Pandas DataFrame
  To convert a date column to a Series of date strings in Pandas DataFrame, use the strftime(~) method.
- Converting datetime column to date and time columns in Pandas
  In Pandas, we first convert the datetime column into strings holding date and time information using strftime(~), and then use Series.str.split(~) to split the string into date and time columns.
- Converting index to datetime in Pandas
  To convert the index of a DataFrame to DatetimeIndex, use Pandas' to_datetime(~) method.
- Creating a column of dates in Pandas
  To create a column of dates in Pandas, use the date_range(~) method.
- Creating a range of dates in Pandas
  To create a range of dates in Pandas, use the date_range(~) method, which returns a DatetimeIndex.
- Extracting month and year from Datetime column in Pandas
  To extract the year and month of a Datetime column in Pandas, use dt.year and dt.month respectively.
- Getting all weekdays between two dates in Pandas
  To get all weekdays between two dates, use Pandas bdate_range(~) method.
- Getting all weekends between two dates in Pandas
  To get all weekends between two dates, use Pandas bdate_range(~) method.
- Getting day of week of date columns in Pandas DataFrame
  To get the day of week for each date in column in Pandas DataFrame, use dt.day_name().
- Getting day unit from date column in Pandas
  To get the day unit from a date column in Pandas DataFrame, use the dt.day property.
- Getting month unit from date column in Pandas
  To get the month unit from a date column in Pandas DataFrame, use the dt.month property.
- Getting name of months in date column in Pandas
  To get the name of months in date column in Pandas DataFrame, use dt.month_name().
- Getting week numbers from a date column in Pandas
  To get the week numbers from a date column in Pandas DataFrame, use the Series' dt.isocalendar().week property.
- Getting year unit from date column in Pandas
  To get the year unit from a date column in Pandas DataFrame, use the dt.year property.
- Modifying dates in Pandas
  To modify dates (Timestamp and datetime) in Pandas, use the replace(~) method.
- Offsetting datetime in Pandas
  To offset datetime in Pandas, initialise a Timedelta object and perform date arithmetics.
- Removing time unit from dates in Pandas
  To remove the time unit from dates in Pandas, use the normalize() method.
- Setting date to beginning of month in Pandas
  To set the dates in a column to the beginning of the month in Pandas DataFrame, use astype("datetime64[M]").
- Sorting DataFrame by dates in Pandas
  To sort by dates in Pandas DataFrame, use either sort_values(~) or sort_index(~).
- Using dates as the index of a DataFrame in Pandas
  To use dates as the index of a DataFrame in Pandas, you can either use Pandas' date_range(~) method, which returns a DatetimeIndex, or use to_datetime(~) to convert an existing index to DatetimeIndex.

Series Cookbook

Appending values to a Series in Pandas
To append a single value in Pandas Series, perform direct assignment using [] syntax. To append multiple values, use the Pandas' concat method.
Applying a function to Series in Pandas
To apply a function to a Pandas Series, use the apply(~) method.
Binning values in a Pandas Series
To bin the values of a Series, use the Pandas' cut(~) method.
Changing data type of Series in Pandas
To change the data type of a Series in Pandas, use the astype(~) method.
Checking if Series has missing values in Pandas
To check if a Series has missing values in Pandas, use the hasnans property.
Checking if a value is NaN in Pandas Series
To check if some value is NaN (missing) in a Pandas Series, use the isna(~) method.
Checking if all values are NaN in Pandas Series
To check if all values in a Pandas Series are missing (NaN), chain the methods isnull and all.
Checking if all values in Series are unique in Pandas
To check if all the values in a Series is unique in Pandas, use the Series's is_unique property.
Converting Python list to Pandas Series
To convert a Python list into a Pandas Series, directly pass the list into the Series constructor.
Converting Series of lists into DataFrame in Pandas
To horizontally expand the lists and convert the Series into a Pandas DataFrame, first convert the Series into a Python list, and then pass this into the DataFrame constructor.
Converting Pandas Series to Python list
To convert a Pandas Series to Python list, either use to_list(~) or Python's built-in list(~).
Converting Series to a Numpy array
To convert a Series to a NumPy array in Pandas, use to_numpy(~) method.
Counting frequency of values in Pandas Series
To count the frequency of values in a Pandas Series, use the value_counts(~) method.
Creating a Series of zeroes in Pandas
To create a Series with zeros in Pandas, directly call the Series constructor.
Creating a Series with constant value in Pandas
To create a Series with constant value in Pandas, directly call the Series constructor.
Filtering strings based on length in Pandas Series
To filter strings based on length in Pandas Series, first get the length of each string using str.length method, and then create a boolean mask of strings that have the length specified. Finally, pass in this mask into loc property to fetch the corresponding entries in the Series.
Filtering values of a Series in Pandas
To filter values of a Series in Pandas, pass in a filter function to the loc property.
Getting frequency counts of values in intervals in Pandas Series
To get the frequency counts of values that fall under some intervals, first use Pandas' cut(~) method to partition the values into bins (segments), and then use value_counts(~) to get the corresponding frequency counts.
Getting index of largest value in Pandas Series
To get the index of the largest value in a Pandas Series, use the idxmax(~) method.
Getting index of smallest value in Pandas Series
To get the index of the largest value in a Pandas Series, use the idxmin(~) method.
Getting index of value in Series in Pandas
To get the index of a value in a Pandas Series, directly use the [] syntax. To get the integer index of a value in Pandas Series, first convert the Series into an Index and then use the get_loc method.
Getting integer index of largest value in Pandas Series
To get the integer index of the largest value in a Pandas Series, use the argmax(~) method.
Getting integer index of smallest value in Pandas Series
To get the integer index of the smallest value in a Pandas Series, use the argmin(~) method.
Getting integer index of value in Pandas Series
To get the integer index of values in a Pandas Series, use NumPy's where(~) method.
Getting intersection of Series in Pandas
To get the intersection of two Series as a Series in Pandas, use NumPy's intersect1d(~) method.
Getting length of each string in Pandas Series
To get the length of each string in Pandas Series, use the str.len() method.
Getting list of integer indices where value is boolean True in Pandas Series
To get a list of integer indices where value is True in Pandas Series, use NumPy's where(~) method.
Getting the index of the nth value in Pandas Series
To get the index of the value located at the i-th integer index in Pandas Series, fetch the index using the index property, and then use square bracket notation to get the index.
Getting the most frequent value in Pandas Series
To get the most frequent value in the Pandas Series, use the mode method. To get the most frequent value and its count in the Pandas Series, use the value_counts method.
Getting value of Series using integer index in Pandas
To get a value from a Pandas Series using integer index, use the iloc property.
Grouping Series by its values in Pandas
To group the values of a Pandas Series, use the groupby(~) method.
Handling error - "Truth value of a Series is ambiguous" in Pandas
'The fix is to avoid the use of "and" and "or" in Pandas, and instead opt to use bitwise operators "&" and "|", respectively.
Inverting a Series of booleans in Pandas
To invert a Series of boolean in Pandas, use ~.
Removing missing values from a Series in Pandas
To remove missing values from a Series in Pandas, use the dropna() method.
Removing substrings from strings in a Series in Pandas
To remove a substring from each string in a Pandas Series, use the str.replace(~) method.
Removing values from Series in Pandas
To remove values from a Pandas Series, either use the drop method or create a boolean mask and pass it into the loc property.
Resetting index of Series in Pandas
To reset the index of Series in Pandas, use the reset_index(~) method.
Sorting values in a Pandas Series
To sort the values in a Pandas Series, use the sort_vales(~) method.
Splitting strings based on space in Pandas DataFrame
To split strings based on space in Pandas, use the Series.str.split(~) method.
Stripping leading and trailing whitespace in Pandas Series
To strip leading and trailing whitespace from each string in a Pandas Series, use the Series' str.strip(~) method.
Taking the floor or ceiling of values in Series in Pandas
To take the floor or ceiling of values in Pandas Series, use NumPy's floor(~) and ceil(~) methods.
Using index.get_loc(~) for multiple values in Pandas
To get the integer indexes of multiple values in the Pandas DataFrame's index, use df.index.get_indexer(~).

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!

Pandas | Recipes reference

DataFrame Cookbooks

Creating DataFrames Cookbook

Data Aggregation Cookbook

Data Manipulation Cookbook

Handling Missing Values

Miscellaneous Cookbook

Multi-index Operations Cookbook

Row and Column Operations Cookbook

Selecting Data Cookbook

Time Series Cookbook

Series Cookbook