Import python venv for stability
This commit is contained in:
+153
@@ -0,0 +1,153 @@
|
||||
Metadata-Version: 2.4
|
||||
Name: fastparquet
|
||||
Version: 2025.12.0
|
||||
Summary: Python support for Parquet file format
|
||||
Home-page: https://github.com/dask/fastparquet/
|
||||
Author: Martin Durant
|
||||
Author-email: mdurant@anaconda.com
|
||||
License: Apache License 2.0
|
||||
Classifier: Development Status :: 4 - Beta
|
||||
Classifier: Intended Audience :: Developers
|
||||
Classifier: Intended Audience :: System Administrators
|
||||
Classifier: License :: OSI Approved :: Apache Software License
|
||||
Classifier: Programming Language :: Python
|
||||
Classifier: Programming Language :: Python :: 3
|
||||
Classifier: Programming Language :: Python :: 3.10
|
||||
Classifier: Programming Language :: Python :: 3.11
|
||||
Classifier: Programming Language :: Python :: 3.12
|
||||
Classifier: Programming Language :: Python :: 3.13
|
||||
Classifier: Programming Language :: Python :: 3.14
|
||||
Classifier: Programming Language :: Python :: Implementation :: CPython
|
||||
Requires-Python: >=3.10
|
||||
License-File: LICENSE
|
||||
Requires-Dist: pandas>=1.5.0
|
||||
Requires-Dist: numpy
|
||||
Requires-Dist: cramjam>=2.3
|
||||
Requires-Dist: fsspec
|
||||
Requires-Dist: packaging
|
||||
Provides-Extra: lzo
|
||||
Requires-Dist: python-lzo; extra == "lzo"
|
||||
Dynamic: author
|
||||
Dynamic: author-email
|
||||
Dynamic: classifier
|
||||
Dynamic: description
|
||||
Dynamic: home-page
|
||||
Dynamic: license
|
||||
Dynamic: license-file
|
||||
Dynamic: provides-extra
|
||||
Dynamic: requires-dist
|
||||
Dynamic: requires-python
|
||||
Dynamic: summary
|
||||
|
||||
fastparquet
|
||||
===========
|
||||
|
||||
.. image:: https://github.com/dask/fastparquet/actions/workflows/main.yaml/badge.svg
|
||||
:target: https://github.com/dask/fastparquet/actions/workflows/main.yaml
|
||||
|
||||
.. image:: https://readthedocs.org/projects/fastparquet/badge/?version=latest
|
||||
:target: https://fastparquet.readthedocs.io/en/latest/
|
||||
|
||||
fastparquet is a python implementation of the `parquet
|
||||
format <https://github.com/apache/parquet-format>`_, aiming integrate
|
||||
into python-based big data work-flows. It is used implicitly by
|
||||
the projects Dask, Pandas and intake-parquet.
|
||||
|
||||
We offer a high degree of support for the features of the parquet format, and
|
||||
very competitive performance, in a small install size and codebase.
|
||||
|
||||
Details of this project, how to use it and comparisons to other work can be found in the documentation_.
|
||||
|
||||
.. _documentation: https://fastparquet.readthedocs.io
|
||||
|
||||
Requirements
|
||||
------------
|
||||
|
||||
(all development is against recent versions in the default anaconda channels
|
||||
and/or conda-forge)
|
||||
|
||||
Required:
|
||||
|
||||
- numpy
|
||||
- pandas
|
||||
- cython >= 0.29.23 (if building from pyx files)
|
||||
- cramjam
|
||||
- fsspec
|
||||
|
||||
Supported compression algorithms:
|
||||
|
||||
- Available by default:
|
||||
|
||||
- gzip
|
||||
- snappy
|
||||
- brotli
|
||||
- lz4
|
||||
- zstandard
|
||||
|
||||
- Optionally supported
|
||||
|
||||
- `lzo <https://github.com/jd-boyd/python-lzo>`_
|
||||
|
||||
|
||||
Installation
|
||||
------------
|
||||
|
||||
Install using conda, to get the latest compiled version::
|
||||
|
||||
conda install -c conda-forge fastparquet
|
||||
|
||||
or install from PyPI::
|
||||
|
||||
pip install fastparquet
|
||||
|
||||
You may wish to install numpy first, to help pip's resolver.
|
||||
This may install an appropriate wheel, or compile from source. For the latter,
|
||||
you will need a suitable C compiler toolchain on your system.
|
||||
|
||||
You can also install latest version from github::
|
||||
|
||||
pip install git+https://github.com/dask/fastparquet
|
||||
|
||||
in which case you should also have ``cython`` to be able to rebuild the C files.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Please refer to the documentation_.
|
||||
|
||||
*Reading*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastparquet import ParquetFile
|
||||
pf = ParquetFile('myfile.parq')
|
||||
df = pf.to_pandas()
|
||||
df2 = pf.to_pandas(['col1', 'col2'], categories=['col1'])
|
||||
|
||||
You may specify which columns to load, which of those to keep as categoricals
|
||||
(if the data uses dictionary encoding). The file-path can be a single file,
|
||||
a metadata file pointing to other data files, or a directory (tree) containing
|
||||
data files. The latter is what is typically output by hive/spark.
|
||||
|
||||
*Writing*
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from fastparquet import write
|
||||
write('outfile.parq', df)
|
||||
write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
|
||||
compression='GZIP', file_scheme='hive')
|
||||
|
||||
The default is to produce a single output file with a single row-group
|
||||
(i.e., logical segment) and no compression. At the moment, only simple
|
||||
data-types and plain encoding are supported, so expect performance to be
|
||||
similar to *numpy.savez*.
|
||||
|
||||
History
|
||||
-------
|
||||
|
||||
This project forked in October 2016 from `parquet-python`_, which was not designed
|
||||
for vectorised loading of big data or parallel access.
|
||||
|
||||
.. _parquet-python: https://github.com/jcrobak/parquet-python
|
||||
|
||||
Reference in New Issue
Block a user