Read_csv error when accessing directly from the website

The dataset can be directly accessed with the link (Yes, read_csv accepts links too!):

ParserError Traceback (most recent call last)
in ()
3 # greenhouse_data= pd.read_html(url)[1]
4 # greenhouse_data.head()
----> 5 greenhouse_data= pd.read_csv(url)

3 frames
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in read(self, nrows)
2155 def read(self, nrows=None):
2156 try:
-> 2157 data = self._reader.read(nrows)
2158 except StopIteration:
2159 if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 7, saw 3

I think they have fixed the link. Please do check again to see if it works.

If it doesn’t work, there’s an option to download the file in specific formats(select CSV in it) and you can access by reading it directly from your system

2 Likes

The link did not work for me either. I followed the link, then clicked on “RAW”, then used that link (https://raw.githubusercontent.com/dphi-official/Datasets/master/Standard_Metropolitan_Areas_Data-data.csv).

The raw link works fine.

1 Like

@balaleo use read_html from pandas to parse all HTML tables on the page. There are only 3 tables accessed and the required table can be found manually in that set.

1 Like

In most cases, it might be an issue with:

  • the delimiters in your data.
  • confused by the headers/column of the file.

To solve pandas.parser.CParserError: Error tokenizing data , try specifying the sep and/or header arguments when calling read_csv.

pandas.read_csv(fileName, sep='you_delimiter', header=None)

Also, the Error tokenizing data may arise when you’re using separator (for eg. comma ‘,’) as a delimiter and you have more separator than expected (more fields in the error row than defined in the header). So you need to either remove the additional field or remove the extra separator if it’s there by mistake. The better solution is to investigate the offending file and to fix it manually so you don’t need to skip the error lines.

In some cases, the pandas.parser.CParserError generated when reading a file written by pandas.to_csv(), it might be because there is a carriage return (’\r’) in a column names, in which case to_csv() will actually write the subsequent column names into the first column of the data frame, it will cause a difference between the number of columns in the first X rows. This difference is one cause of the CParserError .