Research

  • [X] I have searched the [pandas] tag on StackOverflow for similar questions.

  • [X] I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

Pandas json_normalize question

Question about pandas

I am trying to understand an issue I encounter with json_normalize. This may be a bug? The following code:

import pandas
data = [{
        "id": 1039835,
        "status": "on_track",
        "development_plan": {
            "id": 23858,
            "items": [{
                    "id": 176498,
                    "item_name": "itemname 1"
                }, {
                    "id": 172450,
                    "item_name": "itemname 2"
                }, {
                    "id": 172451,
                    "item_name": "itemname 3"
                }
            ]
        },
        "user": {
            "id": 714067
        }
    }
]
print(
    pandas.json_normalize(data,
                          ['development_plan', 'items'],
                          [['development_plan', 'id'],['status'],['user', 'id'],['user']],
                          errors = 'ignore'
    )
)

Returns this dataframe:

   id   item_name development_plan.id    status user.id            user

0 176498 itemname 1 23858 on_track 23858 {'id': 714067} 1 172450 itemname 2 23858 on_track 23858 {'id': 714067} 2 172451 itemname 3 23858 on_track 23858 {'id': 714067}

The question is why the user.id is wrong? It returns the development_plan.id. Adding the complete user dictionary does return the correct user.id though.

Is this a bug or am I missing something?

Comment From: RamMarthi9

Hi phan480,

Use this piece of code: I think this should help u. Ping me if this is working. import pandas as pd

data = [{ "id": 1039835, "status": "on_track", "development_plan": { "id": 23858, "items": [{ "id": 176498, "item_name": "itemname 1" }, { "id": 172450, "item_name": "itemname 2" }, { "id": 172451, "item_name": "itemname 3" } ] }, "user": { "id": 714067 } } ]

Extract data

records = [] for entry in data: status = entry['status'] user_id = entry['user']['id'] development_plan_id = entry['development_plan']['id'] for item in entry['development_plan']['items']: record = { 'id': item['id'], 'item_name': item['item_name'], 'development_plan_id': development_plan_id, 'status': status, 'user_id': user_id } records.append(record)

Create DataFrame

df = pd.DataFrame(records)

print(df)

I got the following result: id item_name development_plan_id status user_id 0 176498 itemname 1 23858 on_track 714067 1 172450 itemname 2 23858 on_track 714067 2 172451 itemname 3 23858 on_track 714067

Comment From: phan480

Thank you for your response, yes it works. But doesn't use the json_normalize functionality? I am using this function to parse a multitude of api's and their responses. I configure their record_path and meta data columns in configuration data.

I would like to avoid creating a unique way to parse every API.

Comment From: yuanx749

I believe there is a bug, and seems this pr #59225 can fix it.