df = pd.DataFrame({"A":[3,4,5,6],"B":[6,7,8,9],"C":[10,11,12,13]})
df
                
            
               A  B  C
0  3  6  10
1  4  7  11
2  5  8  12
3  6  9  13

We first need to divide df into two DataFrames - one for features, and one for targets:


        
        
            
                
                
                    X = df.loc[:,["A","B"]]
y = df.loc[:,"C"]

Here, the : before the , indicates that we want to fetch all rows, and whatever is after the , are the columns to fetch.

We then import and use the train_test_split(~) method to split our X and y into training and testing sets:


        
        
            
                
                
                    from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

Here, note the following:

the splitting process involves random shuffling. You can turn this off by setting shuffle=False.
the split is 75% training and 25% tests by default.
the random_state=1 is needed for reproducibility; despite the random nature of splits, you would still end up with the same splits over and over again by using the same random_state.

Just for your reference, here's X_train:


        
        
            
                
                
                    X_train      # DataFrame
                
            
               A  B
2  5  8
0  3  6
1  4  7

Here's y_test:


        
        
            
                
                
                    y_test      # Series
                
            
            3    13
Name: C, dtype: int64

Changing training and test size

By default, the split is 75% training and 25% tests. We can change this by specifying the parameters train_size and/or test_size, both of which must be between 0 and 1. As you would expect, you just need to specify one of these.

To do a 50:50 split:


        
        
            
                
                
                    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, random_state=1)
X_train
                
            
               A  B
0  3  6
1  4  7

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!