Release History


  • Bug fix: added ‘Extended_Type’ into control_probes.pkl output. Required by methylcheck.plot_controls().
  • Minor bug fixes and improved unit test coverage.
  • Fixed bug where process --minfi was not working with --all. Added more test coverage for CLI.
  • updated read_geo to handle more edge cases
  • deprecated some never-used functions.
    • instead of methylprep.files.idat.RunInfo use IdatDataset( verbose=True )

v1.5.0, v1.5.1

  • MAJOR refactor/overhaul of all the internal classes. This was necessary to fully support the mouse array.
  • new SigSet class object that mirror’s sesame’s SigSet and SigDF object.
  • Combines idats, manifest, and sample sheet into one object that is inherited by SampleDataContainer
  • RawDataset, MethylationDataset, ProbeSubtype all deprecated and replaced by SigSet
  • SampleDataContainer class is now basically the SigSet plus all pipeline processing settings
  • new mouse manifest covers all probes and matches sesame’s output
  • Processing will work even if a batch of IDATs have differing probe counts for same array_type, though those differing probes in question may not be saved.
  • unit tests confirm that methylprep, sesame, and minfi beta values output match to within 1% of each other now. Note that the intermediate stages of processing (after NOOB and after DYE) do not match with sesame in this version. Can be +/- 100 intensity units, likely due to differences in order of steps and/or oob/mask probes used.


  • mouse manifest updated to conform with illumina Genome Studio / sesame probe naming convention.
  • mouse_probes.pkl now includes different probe types. Previously, if a probe type was ‘mu’ (multi) or ‘rp’ (repeat) or IlmnID started with ‘uk’ (unknown?), it was moved to experimental mouse_probes.pkl. This was about 6300 probes. Now, all ‘Multi’ and ‘Random’ probes are moved and stored in mouse_probes.pkl, about 30,000.
  • mouse manifest has a ‘design’ column with tons of human-readable notes on different probe origins, including analogous EPIC human-mapped probes.


  • pipeline CSV output will now include meth, unmeth, beta, and m-values for all probes, including failed probes.
    version 1.4.0 to 1.4.5 was replacing these values with NaN if a probe was filtered by the quality_mask. Pickled beta, M-value, noob_meth, noob_unmeth output files will continue to exclude (e.g. show NaN) probes that failed poobah_pval or quality_mask.


  • fixed qualityMask for epic+


  • faster circleci testing
  • mouse probes have duplicate names, breaking dye-bias step, so it will fallback to linear-dye when duplicates are present
  • added more mouse array test coverage


  • now uses sesame’s infer_type_I_channel function to detect and correct probe switching, if sesame=True
  • uses sesame’s nonlinear dye bias correction function, if sesame=True
    instead of the previous linear-dye-correction in the NOOB function.
  • as part of the run_pipeline(sesame=True) default ON settings, it will apply sesame’s “quality_mask”
    that automatically removes probes that are unreliable from all data.
  • reads more IDAT raw data (run_info, probe nbeads, probe standard deviation)
    • IdatDataset has new kwargs, including bit=’float16’ option to cut file/memory usage in half by clipping max intensity at 32127 (which cuts off ~0.01% of probes)
  • processing will mirror sesame more closely now, instead of minfi (to revert, use sesame=False in run_pipeline)
  • adds sesame quality_mask, which auto-hides known set of sketchy probes.
  • internal objects updated so that values align in every stage of processing
    (i.e. if you apply the sesame quality mask, the output files and the SampleDataContainer will exclude those probes)
  • make_pipeline provides a scikit-learn style interface, as alternative to run_pipeline


  • ensures methylprep output matches sesame output
  • order of probes in CSVs, pickles, and SampleDataContainer doesn’t match
  • fixes bug where are few probes had negative meth/unmeth values because of int16 limits.
    Now it uses unsigned int16 data type and unit tests confirm no negative values appear.


  • updated support for Illumina mouse array
  • summarized processing warnings at end, to make tqdm progress bar cleaner


  • run_pipeline() has 50% shorter processing time due to user-submitted changes
  • idats can be processed while gzipped (.idat.gz) and saved this way using –no_uncompress flag
  • ‘download’ function manages FTP connection better
  • improved unit testing: download and process_series
  • run_pipeline() function has two new optional parameters
    • poobah_decimals: if you want more than the default 3 decimals saved in poobah_values.pkl and _processed.csv files-, then specify a higher limit.
    • poobah_sig: default significance level for excluding probes is 0.05. You can set it to something else in the 0.1 to 0.000001 range, if you want.


  • improved methylprep’s run_pipeline() process to use less memory and avoid memory usage spikes
  • files created are also smaller too (because they use float32 or int16 instead of 64 bit data)
  • output files like beta_values.pkl are automatically consolidated at end of pipeline.
    batch_size will split large batches into multiple files during processing to save memory, but the output will be merged at the end.
  • improved support for Illumina Mouse Arrays.
  • has a new all-encompassing pipeline that will read GEO data sets and convert
    any data file type into a pickle of beta_values, whether from idats or processed matrix files.

Older versions exist on pypi, but no changelog