Describe the bug
Selecting from a partitioned parquet file with EXPLAIN ANALYZE raises this error:
Exception: DataFusion error: Internal error: Unsupported logical plan: Analyze must be root of the plan.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues
To Reproduce
import os
import shutil
import pyarrow as pa
import pyarrow.parquet as pq
import datafusion
BASE_DIR = "repro_analyze_bug"
if os.path.exists(BASE_DIR):
shutil.rmtree(BASE_DIR)
partition_dir = f"{BASE_DIR}/a=1"
os.makedirs(partition_dir)
data_schema = pa.schema([
('b', pa.int32())
])
data_table = pa.Table.from_arrays(
[[10, 20, 30, 40, 50, 60]],
schema=data_schema
)
pq.write_table(data_table, f"{partition_dir}/data.parquet")
ctx = datafusion.SessionContext()
ctx.sql(f"""
CREATE EXTERNAL TABLE my_table (
b INT
)
STORED AS PARQUET
LOCATION '{BASE_DIR}/a=*/*.parquet'
PARTITIONED BY (a INT)
""")
result = ctx.sql("EXPLAIN ANALYZE SELECT * FROM my_table")
# show will trigger the exception. collect will not
result.show()
# result output in ipython:
#[ins] In [2]: result
#Out[2]:
# DataFrame()
# +-------------------+-----------------------+
# | plan_type | plan |
# +-------------------+-----------------------+
# | Plan with Metrics | EmptyExec, metrics=[] |
# | | |
# +-------------------+-----------------------+
Expected behavior
I would expect no exception, and a query plan to be displayed
Additional context
The table is selectable but appears to have 0 rows. I'm not sure why.
[ins] In [7]: ctx.sql("select count(*) from my_table")
Out[7]:
DataFrame()
+----------+
| count(*) |
+----------+
| 0 |
+----------+
Describe the bug
Selecting from a partitioned parquet file with EXPLAIN ANALYZE raises this error:
Exception: DataFusion error: Internal error: Unsupported logical plan: Analyze must be root of the plan.
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues
To Reproduce
Expected behavior
I would expect no exception, and a query plan to be displayed
Additional context
The table is selectable but appears to have 0 rows. I'm not sure why.