Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
_s = pd.Series([ {"k": f"{i}", "m": "q"} for i in range(5)
])
pd.json_normalize(_s,record_prefix="T")
Issue Description
The record_prefix
is completely ignored and the resulting columns are note renamed.
k m
0 0 q
1 1 q
2 2 q
3 3 q
4 4 q
The issue is in L517-L527 where a short cut is taken for speed.
One option to fix this would be to pass the record_prefix
as prefix
to nested_to_record
.
This is hidden in the documentation example by using a positional argument for the record path
>>> data = {"A": [1, 2]}
>>> pd.json_normalize(data, "A", record_prefix="Prefix.")
Prefix.0
0 1
1 2
Expected Behavior
The prefix should be applied (or an explicit warning be given)
T.k T.m
0 0 q
1 1 q
2 2 q
3 3 q
4 4 q
Installed Versions
Comment From: SumitkCodes
@JensHeinrich Can you check #62215
Comment From: JensHeinrich
@JensHeinrich Can you check #62215
A quick check seems to show it working as expected now. Thanks!
Comment From: rhshadrach
Thanks for the report. From the documentation, best I can guess, the intention was for record_prefix
to only apply when record_path
was specified. However, it does make sense to me to apply this prefix regardless. The code that applies this prefix later in the method can be used here too:
if record_prefix is not None:
result = result.rename(columns=lambda x: f"{record_prefix}{x}")
I'm positive on this - PRs welcome!
Comment From: JensHeinrich
Thanks for the report. From the documentation, best I can guess, the intention was for
record_prefix
to only apply whenrecord_path
was specified. However, it does make sense to me to apply this prefix regardless. The code that applies this prefix later in the method can be used here too:if record_prefix is not None: result = result.rename(columns=lambda x: f"{record_prefix}{x}")
My instinct would have been similar to #62215 and change https://github.com/SumitkCodes/pandas/blob/3e1d6d5bc853bd8bc983291b12aec2dbf477dde6/pandas/io/json/_normalize.py#L526
if record_prefix:
data = nested_to_record(data, prefix=record_prefix, sep=sep, max_level=max_level + 1, level = 1)
else:
data = nested_to_record(data, sep=sep, max_level=max_level)
But i just realized this would behave different from record_path
-path as it would be
result = result.rename(columns=lambda x: f"{record_prefix}{sep}{x}")
instead of
result = result.rename(columns=lambda x: f"{record_prefix}{x}")
But ignoring the sep
at https://github.com/SumitkCodes/pandas/blob/3e1d6d5bc853bd8bc983291b12aec2dbf477dde6/pandas/io/json/_normalize.py#L580 is unexpected IMO
Comment From: PluvioXO
Done a solution that should be optimal and passes made unit tests sending PR now.
Comment From: PluvioXO
From what I can see that should fix it, if I missed something lmk @JensHeinrich