-
pub fn with_projection(self, mask: ProjectionMask) -> Self Can read data from the provided column indexes.But the entire parquet data needs to be provided. FYI, in arrow2 can only read column data. let page_meta_data = PageMetaData {
column_start: meta.offset,
num_values: meta.num_values as i64,
compression: Self::to_parquet_compression(compression)?,
descriptor: column_descriptor.descriptor.clone(),
};
let pages = PageReader::new_with_page_meta(
chunk,
page_meta_data,
Arc::new(|_, _| true),
vec![],
usize::MAX,
);
/// A fallible [`Iterator`] of [`CompressedDataPage`]. This iterator reads pages back
/// to back until all pages have been consumed.
/// The pages from this iterator always have [`None`] [`crate::page::CompressedDataPage::selected_rows()`] since
/// filter pushdown is not supported without a
/// pre-computed [page index](https://github.com/apache/parquet-format/blob/master/PageIndex.md).
pub struct PageReader<R: Read> {
} PageReader can consume all pages. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
The readers automatically perform IO pushdown, they will only fetch the byte ranges needed, this includes column projection, and extends through to row group and page pruning, late materialization, etc... The readers aim to be batteries included, you shouldn't need to worry about pages, column chunks, etc... it will just do the right thing |
Beta Was this translation helpful? Give feedback.
The readers automatically perform IO pushdown, they will only fetch the byte ranges needed, this includes column projection, and extends through to row group and page pruning, late materialization, etc...
The readers aim to be batteries included, you shouldn't need to worry about pages, column chunks, etc... it will just do the right thing