dask.dataframe.Series.repartition
- Series.repartition(divisions=None, npartitions=None, partition_size=None, freq=None, force=False)
Repartition dataframe along new divisions
- Parameters
- divisionslist, optional
List of partitions to be used. Only used if npartitions and partition_size isn’t specified. For convenience if given an integer this will defer to npartitions and if given a string it will defer to partition_size (see below)
- npartitionsint, optional
Number of partitions of output. Only used if partition_size isn’t specified.
- partition_size: int or string, optional
Max number of bytes of memory for each partition. Use numbers or strings like 5MB. If specified npartitions and divisions will be ignored.
Warning
This keyword argument triggers computation to determine the memory size of each partition, which may be expensive.
- freqstr, pd.Timedelta
A period on which to partition timeseries data like
'7D'
or'12h'
orpd.Timedelta(hours=12)
. Assumes a datetime index.- forcebool, default False
Allows the expansion of the existing divisions. If False then the new divisions’ lower and upper bounds must be the same as the old divisions’.
Notes
Exactly one of divisions, npartitions, partition_size, or freq should be specified. A
ValueError
will be raised when that is not the case.Examples
>>> df = df.repartition(npartitions=10) >>> df = df.repartition(divisions=[0, 5, 10, 20]) >>> df = df.repartition(freq='7d')