If your input file contains comments, then you can specify what identifies a comment. By default, comments="#", that is, characters after the # in the same line will be treated as a comment. You can set None if your text file does not include any comment.

4. delimiterlink | string | optional

The string used to separate your data. By default, the delimiter is a whitespace.

5. skiprows | int | optional

This parameter has been replaced by skip_header in Numpy version 1.10.

6. skip_headerlink | int | optional

The number of rows in the beginning to skip. Note that this includes comments. By default, skiprows=0.

7. skip_footerlink | int | optional

The number of rows at the end to skip. Note that this includes comments. By default, skiprows=0.

8. converterslink | dict<int,function> | optional

You can apply a mapping to transform your column values. The key is the integer index of the column, and the value is the desired mapping. Check examples below for clarification. By default, dict=None.

9. missing | string | optional

This parameter has been replaced by missing_values in Numpy version 1.10.

10. missing_valueslink | string or sequence<string> | optional

The sequence of strings that will be treated as missing values. This is only relevant when usemask=True. Consult examples for clarification.

11. filling_valueslink | value or dict or sequence<value> | optional

If a single value is passed then all missing and invalid values will be replaced by that value. By passing a dict, you can specify different fill values for different columns. The key is the column integer index, and the value is the fill value for that column.

12. usecolslink | int or sequence | optional

The integer indices of the columns you want to read. By default, usecols=None, that is, all columns are read.

13. nameslink | None or True or string or sequence<string> | optional

The field names of the resulting array. This parameter is relevant only for those who wish to create a structured array.

Type	Description
None	A standard array instead of a structured array will be returned.
True	The first row after the specified skip_header lines will be treated as the field names.
string	A single string containing the field names separated by comma.
sequence	An array-like structure containing the field names.

By default, names=None.

As a side note, structured arrays are not commonly used since Series and DataFrames in the Pandas library are better alternatives.

14. excludelistlink | sequence | optional

The passed strings will be appended to the default list of ["return", "file", "print"]. Note that an underscore will be appended to the passed strings (e.g. if "abc" is passed, then "abc_" will be appended to the default list). This is only relevant for those who wish to create a structured array.

15. deletecharslink | string of length one or sequence or dict | optional

The character(s) to delete from the names.

16. defaultfmtlink | string | optional

The format of the resulting field names. The syntax follows that of Python's standard string formatter:

17. autostriplink | boolean | optional

Whether or not to remove leading and trailing in the values. This is only applicable for values that are strings. By default, autostrip=False.

18. replace_spacelink | string | optional

The string used to replace spaces in the field names. Note that the leading and trailing spaces will be removed. By default, replace_space="_".

19. case_sensitivelink | string or boolean | optional

How to handle the casing of string characters.

Value	Description
True	Leave the casing as is.
False	Convert value to uppercase.
"upper"	Convert value to uppercase.
"lower"	Convert value to lowercase.

By default, case_sensitive=True.

20. unpacklink | boolean | optional

Instead of having one giant Numpy array, you could retrieve column arrays individually by setting this to True. For instance, col_one, col_two = np.genfromtxt(~, unpack=True). By default, unpack=False.

21. usemask | boolean | optional

Whether or not to return a masked boolean array. By default, usemark=True.

22. looselink | boolean | optional

If True, invalid values will be converted to nan and no error will be raised. By default, loose=True.

23. invalid_raiselink | boolean | optional

If the number of values in a row do not match up with the number of columns, then an error is raised. If set to False, then invalid rows will be omitted from the resulting array. By default, invalid_raise=True.

24. max_rowslink | int | optional

The maximum number of rows to read. By default, all lines are read.

25. encoding | string | optional

The encoding to use when reading the file (e.g. "latin-1", "iso-8859-1"). By default, encoding="bytes".

Return value

A Numpy array with the imported data.

Examples

Basic usage

Suppose we have the following text-file called my_data.txt:

To import this file:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt")
a
                
            
            array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

Note that this Python script resides in the same directory as my_data.txt.

Also, the default data type is float64, regardless of whether or not the numbers in the text file are all integers:


        
        
            
                
                
                    print(a.dtype)
                
            
            float64

Specifying the desired data type

Once again, suppose we have the following text-file called my_data.txt:

Instead of using the default float64, we can specify a type using dtype:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", dtype=int)
a
                
            
            array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

Now, all the values have type float64.

You can also pass a list of types to assign different types to different columns:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", dtype=[np.int,32 int, np.float,32 float])
a
                
            
            array([(1, 2, 3., 4.), (5, 6, 7., 8.)],
       dtype=[('f0', '<i4'), ('f1', '<i8'), ('f2', '<f4'), ('f3', '<f8')])

Here, the i4 represents int32 while i8 represents int64.

Note that this is a special type of Numpy array called structured array. This type of arrays is not often used in practise since Series and DataFrames in the Pandas library are alternatives with more feature.

Specifying a custom delimiter

Suppose our my_data.txt file is as follows:

Since our data is comma-separated, set delimiter="," like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",")
a
                
            
            1,2
3,4

Handling comments

Suppose our my_data.txt file is as follows:


        
        
            
                
                
                    1,2,3,4    / I'm the first row!
5,6,7,8    / I'm the second row!

To strip out comments in the text-file, specify comments:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", comments="/")
a
                
            
            array([[1., 2., 3., 4.],
       [5., 6., 7., 8.]])

Specifying skip_header

Suppose our my_data.txt file is as follows:

To skip the first row:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", skip_header=1)
a
                
            
            array([[4., 5., 6.],
       [7., 8., 9.]])

Specifying skip_footer

Suppose our my_data.txt file is as follows:

To skip the last row:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", skip_footer=1)
a
                
            
            array([[1., 2., 3.],
       [4., 5., 6.]])

Specifying converters

Suppose our my_data.txt file is as follows:

Just as an arbitrary example, suppose we wanted to add 10 to all values of the 1st column, and make all the values of the 2nd column be 20:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", converters={0: lambda x: int(x) + 10, 1: lambda x: 20})
a
                
            
            array([(11, 20),
       (13, 20)], dtype=[('f0', '<i8'), ('f1', '<i8')])

Here, the "f0" and "f1" are the field names, and the "i8" denote a int64 data type.

Specifying missing_values

Suppose our my_data.txt file is as follows:

All missing and invalid values are treated as nan, so you wouldn't need to specify missing_values="??" here:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",")
a
                
            
            array([[ 3., nan],
       [nan,  6.]])

Note that is not possible to set the value 6, for instance, as a missing value. The missing_values comes into play only when you set usemask=True.

Here's usemask=True without missing_values:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", usemask=True)
a
                
            
            masked_array(
   data=[[3.0, nan],
         [--, 6.0]],
   mask=[[False, False],
         [ True, False]],
   fill_value=1e+20)

Notice how missing and invalid values are differentiated here - ?? has been mapped to nan with the mask boolean flagged as False, while an actual missing value has been mapped to -- with the masked boolean set as True.

Now, here's usemask=True with missing_values="??":


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", missing_values="??", usemask=True)
a
                
            
            masked_array(
   data=[[3.0, --],
         [--, 6.0]],
   mask=[[False,  True],
         [ True, False]],
   fill_value=1e+20)

The key here is that, ??, which is inherently an invalid value, is now treated like a missing_value.

Specifying filling_values

By default, all missing and invalid values are replaced by nan. To change this, specify the filling_values like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", filling_values=0)
a
                
            
            array([[3., 0.],
       [0., 6.]])

You could also pass in a dictionary, with the following key-value pairs:

key: the column integer index
value: the fill value

For instance, to set to map all missing and invalid values for first column to -1, and those for the second column to -2:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", filling_values={0:-1, 1:-2})
a
                
            
            array([[ 3., -2.],
       [-1.,  6.]])

Reading only certain columns

Suppose our my_data.txt file is as follows:

To read only the 1st and 3rd columns (i.e. column index 0 and 2):


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", usecols=[0,2])
a
                
            
            array([[1., 3.],
       [4., 6.]])

Specifying names

Suppose our my_data.txt file is as follows:

To assign a name to each column:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=("A","B"))
a
                
            
            array([(3., 4.),
       (5., 6.)],
       dtype=[('A', '<f8'), ('B', '<f8')])

Here, we have assigned the name A to the first column. Note that f8 just denotes the type float64.

Specifying excludelist

Suppose our my_data.txt file is as follows:

To append a _ to certain names:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["A","B","C"], excludelist=["A"])
a
                
            
            array([(3., 4., 5.), (6., 7., 8.)],
      dtype=[('A_', '<f8'), ('B', '<f8'), ('C', '<f8')])

Notice how we have A_ as the field name for the first column.

Specifying deletechars

Suppose our my_data.txt file is as follows:

To remove the character "c" from the field names:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["Ab","BcD"], deletechars="c")
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('Ab', '<f8'), ('BD', '<f8')])

To remove multiple characters:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["Ab","BcD"], deletechars=["c","A"])
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('b', '<f8'), ('BD', '<f8')])

Specifying defaultfmt

Suppose our my_data.txt file is as follows:

If the returned result is a structured array, and the names parameter is not defined, then the field names take on the values "f0", "f1" and so on by default:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", dtype=[int, float])
a
                
            
            array([(3, 4.), (5, 6.)], dtype=[('f0', '<i8'), ('f1', '<f8')])

To customise this, pass the defaultfmt parameter:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", dtype=[int, float], defaultfmt="my_var_%i")
a
                
            
            array([(3, 4.), (5, 6.)], dtype=[('my_var_0', '<i8'), ('my_var_1', '<f8')])

Here, the %i is a placeholder for the column integer index.

Specifying autostrip

Suppose our my_data.txt file is as follows:

By default, all whitespaces that appear in the values are kept intact:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", dtype="U")
a
                
            
            array([['3', 'a', ' 4'],
       ['5 ', 'b c', '6']], dtype='<U5')

If you want to strip away the leading and trailing whitespaces, set autostrip=True like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", autostrip=True, dtype="U")
a
                
            
            array([['3', 'a', '4'],
       ['5', 'b c', '6']], dtype='<U3')

Notice how the whitespace in "b c" is still there.

Specifying replace_space

Suppose our my_data.txt is as follows:

By default, the non-leading and non-trailing spaces are replaced by _:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["A B", " C "])
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('A_B', '<f8'), ('C', '<f8')])

Notice how the leading and trailing spaces have been stripped.

To replace the spaces by a custom string, set the replace_space parameter like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["A B", " C "], replace_space="K")
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('AKB', '<f8'), ('C', '<f8')])

Specifying case_sensitive

Suppose our my_data.txt is as follows:

By default, case_sensitive is set to True, which means that the field names are left as is.


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["Ab", "dC"])
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('Ab', '<f8'), ('dC', '<f8')])

To convert field names to uppercase, either set "upper" or False:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["Ab", "dC"], case_sensitive=False)
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('AB', '<f8'), ('DC', '<f8')])

To convert field names to lowercase, set "lower":


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", names=["Ab", "dC"], case_sensitive="lower")
a
                
            
            array([(3., 4.), (5., 6.)], dtype=[('ab', '<f8'), ('dc', '<f8')])

Specifying unpack

Suppose our my_data.txt file is as follows:

To retrieve the data per column instead of a single Numpy array:


        
        
            
                
                
                    col_one, col_two = np.genfromtxt("my_data.txt", unpack=True)
print("col_one:", col_one)
print("col_two:", col_two)
                
            
            col_one: [3. 5.]
col_two: [4. 6.]

Specifying loose

Suppose our my_data.txt file is as follows:

By default, loose=True, which means that invalid values (e.g. the ?? here) are converted into nan:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt")
a
                
            
            array([[ 3.,  4.],
       [ 5., nan]])

To raise an error if our file contains invalid values, set loose=False, like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", loose=False)
a
                
            
            ValueError: Cannot convert string '??'

Specifying invalid_raise

Suppose our my_data.txt file is as follows:

Here, the second row only contains 1 value even though the array seemingly has 2 columns.

By default, invalid_raise=True, which means that if the file contains invalid rows, then an error is raised:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",")
a
                
            
            ValueError: Some errors were detected!
    Line #2 (got 1 columns instead of 2)

We can choose to omit invalid rows by setting it to False, like so:


        
        
            
                
                
                    a = np.genfromtxt("my_data.txt", delimiter=",", invalid_raise=False)
a
                
            
            array([[3., 4.],
       [7., 8.]])

No error is raised, but Numpy is nice enough to give us a warning:


        
        
            
                
                
                    ConversionWarning: Some errors were detected!
    Line #2 (got 1 columns instead of 2)

Specifying the desired dimension

Suppose our sample.txt only had one row:

By default, loadtxt(~) will generate an one-dimensional array:


        
        
            
                
                
                    a = np.loadtxt("sample.txt")
a
                
            
            array([1., 2., 3., 4.])

We can specify that we want our array to be two-dimensional by:


        
        
            
                
                
                    a = np.loadtxt("sample.txt", ndmin=2)
a
                
            
            array([[1., 2., 3., 4.]])

Specifying max_rows

Suppose our my_data.txt file is as follows:

To read only the first two rows instead of the entire file:


        
        
            
                
                
                    a = np.genfromtxt("myy_data.txt", max_rows=2)
a
                
            
            array([[1., 2.],
       [3., 4.]])

Published by Isshin Inada

Edited by 0 others

Did you find this page useful?

thumb_up

thumb_down

Comment

Citation

Ask a question or leave a feedback...

thumb_up

thumb_down

chat_bubble_outline

settings

Enjoy our search

Hit / to insta-search docs and recipes!