Research
-
[X] I have searched the [pandas] tag on StackOverflow for similar questions.
-
[X] I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
Pandas json_normalize question
Question about pandas
I am trying to understand an issue I encounter with json_normalize. This may be a bug? The following code:
import pandas
data = [{
"id": 1039835,
"status": "on_track",
"development_plan": {
"id": 23858,
"items": [{
"id": 176498,
"item_name": "itemname 1"
}, {
"id": 172450,
"item_name": "itemname 2"
}, {
"id": 172451,
"item_name": "itemname 3"
}
]
},
"user": {
"id": 714067
}
}
]
print(
pandas.json_normalize(data,
['development_plan', 'items'],
[['development_plan', 'id'],['status'],['user', 'id'],['user']],
errors = 'ignore'
)
)
Returns this dataframe:
id item_name development_plan.id status user.id user
0 176498 itemname 1 23858 on_track 23858 {'id': 714067} 1 172450 itemname 2 23858 on_track 23858 {'id': 714067} 2 172451 itemname 3 23858 on_track 23858 {'id': 714067}
The question is why the user.id is wrong? It returns the development_plan.id. Adding the complete user dictionary does return the correct user.id though.
Is this a bug or am I missing something?
Comment From: RamMarthi9
Hi phan480,
Use this piece of code: I think this should help u. Ping me if this is working. import pandas as pd
data = [{ "id": 1039835, "status": "on_track", "development_plan": { "id": 23858, "items": [{ "id": 176498, "item_name": "itemname 1" }, { "id": 172450, "item_name": "itemname 2" }, { "id": 172451, "item_name": "itemname 3" } ] }, "user": { "id": 714067 } } ]
Extract data
records = [] for entry in data: status = entry['status'] user_id = entry['user']['id'] development_plan_id = entry['development_plan']['id'] for item in entry['development_plan']['items']: record = { 'id': item['id'], 'item_name': item['item_name'], 'development_plan_id': development_plan_id, 'status': status, 'user_id': user_id } records.append(record)
Create DataFrame
df = pd.DataFrame(records)
print(df)
I got the following result: id item_name development_plan_id status user_id 0 176498 itemname 1 23858 on_track 714067 1 172450 itemname 2 23858 on_track 714067 2 172451 itemname 3 23858 on_track 714067
Comment From: phan480
Thank you for your response, yes it works. But doesn't use the json_normalize functionality? I am using this function to parse a multitude of api's and their responses. I configure their record_path and meta data columns in configuration data.
I would like to avoid creating a unique way to parse every API.
Comment From: yuanx749
I believe there is a bug, and seems this pr #59225 can fix it.