Avoid This Costly Mistake When Indexing A…

Avi Chawla

Apr 22, 2023

Row-then-column is not the same as Column-then-row.

Read →

2 Comments

D.W. Eversole

Apr 22, 2023

Let me see if I get this straight. df = pd.dataframe('somedata')

df.iloc[1] should pull the first row, uses a lot of memory and is slow (lets assume the index has a name like "apples" for this row)

but if you transpose the df dataframe first, you could call df['apples] and get the same data returned in a series, but it would be faster and more memory efficient?

Is this correct?

Expand full comment

Reply (1)

Avi Chawla

Apr 23, 2023Edited

for "df.iloc[1]" it's hard to comment on memory usage here when you fetch the row. but it's surely slow as shown in the demo and because of the reasons mentioned.

coming to the transpose part, technically, yes, that would be faster once you are done with the transpose. But transposing the dataframe isn't recommended as it will map the whole dataframe to a new memory location, which in turn, will take additional time. Also, if you have plenty of rows in the original df, it does make much intuitive sense to transpose and have plenty of columns. The best thing to do eliminate iterating over rows by finding a vectorized solution. But if you are not able to do that, a better approach is to convert the dataframe to a numpy array. numpy array is row major by default so it offers much faster row iteration than pandas. See this: https://avichawla.substack.com/p/if-you-are-not-able-to-code-a-vectorized

Expand full comment

Daily Dose of Data Science

Avoid This Costly Mistake When Indexing A…