Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
data = read_xml(
...,
parser='lxml',
stylesheet=stylesheet,
iterparse=iterparse,
)
The stylesheet is ignored.
Issue Description
if self.iterparse is None:
self.xml_doc = self._parse_doc(self.path_or_buffer)
if self.stylesheet:
self.xsl_doc = self._parse_doc(self.stylesheet)
self.xml_doc = self._transform_doc()
elems = self._validate_path()
this should be continued with:
elif self.stylesheet:
raise SomeExceptionExplaining("iterparse and stylesheet can not be used together")
or better is it possible to implement it?
Silent failure is definietly misleading and it took quite some time to get it, as "everything" was correct, checking with other tools the stylesheet was transforming correctly, yet the read xml yielded different results.
Expected Behavior
Raise exception or implement stylesheet on iterparse
Installed Versions
Comment From: ParfaitG
Hi, apologies for late reply. Since iterparse iteratively parses XML documents and never holds the full tree in memory and XSLT stylesheets require reading the full tree, iterparse
and stylesheets
cannot be used at the same time to parse XML. However. if your use case does use both methods on same XML, please post such an interesting example.
Rather than raising exceptions at the many uses of these optional arguments of IO methods, docs should be used. But read_xml
and IO tools docs mentions iterparse
does not use xpath
being the alternative method to traverse and parse trees. And xpath
and stylesheets
work together. Maybe added clarity can help.