nixpkgs

mirror of https://github.com/NixOS/nixpkgs.git synced 2024-12-03 19:15:39 +00:00

Author	SHA1	Message	Date
Matthew Bauer	f1346f5854	tesseract: supports darwin	2017-04-23 18:08:51 -05:00
aszlig	7b5263e1a6	tesseract: Package version 4.x from Git master Tesseract 4 has got a new long short-term memory neural networking based OCR engine which really helps a lot in terms of accuracy and our VM tests. I ran the new version across a bunch of different screenshots and comparing the results to the 3.x branch and it really makes a big difference, especially with various font rendering settings. The only downside of this is that version 4 hasn't been released yet and is in alpha state right now, but it will eventually get there and the only solutions that came into my mind sticking to version 3 were really sub-par: * Use several passes with different color negation on the screenshots. * Train Tesseract 3 specifically for screenshots. This is sub-par because we'd need to do it for Tesseract 4 from scratch again. * Change the test systems so that it specifically uses only OCR an font when displaying. I've actually tried this but this also isn't accurate enough with our default font rendering setup. * Turn off special font rendering settings for our tests. In conjunction with changing to an OCR font this might work but it won't catch all the cases, because applications might use their own font rendering. Given that version 4 is faster[1] when it comes to OCR detection and also the points just mentioned I think even using the alpha version just for tests isn't going to hurt anybody. [1]: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance Signed-off-by: aszlig <aszlig@redmoonstudios.org>	2017-04-11 03:21:46 +02:00
aszlig	c381fa9b63	tesseract: 3.04.01 -> 3.05.00 Upstream changelog: * Made some fine tuning to the hOCR output. * Added TSV as another optional output format. * Fixed ABI break introduced in 3.04.00 with the AnalyseLayout() method. * text2image tool - Enable all OpenType ligatures available in a font. This feature requires Pango 1.38 or newer. * Training tools - Replaced asserts with tprintf() and exit(1). * Fixed Cygwin compatibility. * Improved multipage tiff processing. * Improved the embedded pdf font (pdf.ttf). * Enable selection of OCR engine mode from command line. * Changed tesseract command line parameter '-psm' to '--psm'. * Added new C API for orientation and script detection, removed the old one. * Increased minimum autoconf version to 2.59. * Removed dead code. * Fixed many compiler warning. * Fixed memory and resource leaks. * Fixed some issues with the 'Cube' OCR engine. * Fixed some openCL issues. * Added option to build Tesseract with CMake build system. * Implemented CPPAN support for easy Windows building. The upstream URL of the change log is: https://github.com/tesseract-ocr/tesseract/releases/tag/3.05.00 Tested by building against the following packages that directly depend on it: * vapoursynth (with ocrSupport = true) * pyocr (fails) * vobsub2srt Also tested against the following NixOS VM tests that have OCR enabled: * nixos/tests/chromium.nix -A stable * nixos/tests/emacs-daemon.nix * nixos/tests/installer.nix -A luksroot * nixos/tests/lightdm.nix * nixos/tests/plasma5.nix * nixos/tests/sddm.nix All of the packages and tests except pyocr build/succeed on x86_64-linux. Fixing pyocr is outside of the scope of this commit and will happen very soon. Signed-off-by: aszlig <aszlig@redmoonstudios.org>	2017-04-11 03:21:32 +02:00
aszlig	288a79187c	tesseract: Reintroduce enableLanguages I've removed that attribute in `68bc260ca2`, because the language files no longer were distributed as seperate files, but if we for example only want to use the English training data, the closure size of Tesseract gets quite large (around 1.2 GB), which is a bit much just to be able to run NixOS VM tests. For this reason I've also switched the VM tests back to using only the English language. Tested using the following VM tests (the ones that have OCR enabled) on x86_64-linux: * nixos/tests/chromium.nix -A stable * nixos/tests/emacs-daemon.nix * nixos/tests/installer.nix -A luksroot * nixos/tests/lightdm.nix * nixos/tests/plasma5.nix * nixos/tests/sddm.nix Signed-off-by: aszlig <aszlig@redmoonstudios.org>	2017-04-11 03:21:26 +02:00
aszlig	68bc260ca2	tesseract: 3.02.02 -> 3.04.01 From the upstream changelog: * Tesseract development is now done with Git and hosted at github.com (Previously we used Subversion as a VCS and code.google.com for hosting). So let's move over to the GitHub repository, where the organisation also includes a full repository for tessdata, so we no longer need to fetch it one-by-one. The build also got significantly simpler, because we no longer need to run autoconf, neither do we need to patch the configure script for Leptonica headers. This also has the advantage that we don't need to use the enableLanguages attribute for the test runner anymore. Full upstream changelog can be found at: https://github.com/tesseract-ocr/tesseract/blob/c4d273d33cc36e/ChangeLog Tested against all NixOS tests with enabled OCR (chromium, emacs-daemon, installer.luksroot and lightdm). Signed-off-by: aszlig <aszlig@redmoonstudios.org> Cc: @viric	2016-12-19 22:25:38 +01:00
Franz Pletz	aff1f4ab94	Use general hardening flag toggle lists The following parameters are now available: * hardeningDisable To disable specific hardening flags * hardeningEnable To enable specific hardening flags Only the cc-wrapper supports this right now, but these may be reused by other wrappers, builders or setup hooks. cc-wrapper supports the following flags: * fortify * stackprotector * pie (disabled by default) * pic * strictoverflow * format * relro * bindnow	2016-03-05 18:55:26 +01:00
Robin Gloster	ea1de67f35	tesseract: turn off format hardening	2016-02-20 22:33:10 +00:00
Mateusz Kowalczyk	6014752e73	tesseract: fix postInstall We needed to separate each of the unpack commands.	2015-05-23 02:27:47 +01:00
aszlig	adb7581459	tesseract: Allow to specify a subset of languages. Especially useful for our OCR based VM tests, where we only need the english language. By default the argument is null so all languages are included. If a list of language name is passed only those languages are enabled, for example: tesseract.override { enableLanguages = [ "eng" "spa" ]; }; To only enable support for English and Spanish languages. Signed-off-by: aszlig <aszlig@redmoonstudios.org>	2015-05-22 07:45:59 +02:00
Mateusz Kowalczyk	03a37d5851	Add Japanese to default tesseract languages	2014-08-17 11:54:25 +01:00
Mateusz Kowalczyk	7a45996233	Turn some license strings into lib.licenses values	2014-07-28 11:31:14 +02:00
Domen Kozar	808cadd390	tesseract: simplify	2013-06-12 00:50:52 +02:00
Domen Kozar	1b64fc9360	tesseract: upgrade to 3.02.02 and add some languages	2013-06-11 19:22:30 +02:00
Florian Friesdorf	892947cd93	tesseract-3.0.1 svn path=/nixpkgs/trunk/; revision=34453	2012-06-11 10:28:28 +00:00
Lluís Batlle i Rossell	9a0a0c92c7	Adding training results files for some languages to tesseract to be able to do OCR directly. svn path=/nixpkgs/trunk/; revision=26956	2011-04-24 20:01:19 +00:00
Lluís Batlle i Rossell	626f654602	Adding tesseract, an OCR engine I just found but never tried. svn path=/nixpkgs/trunk/; revision=26952	2011-04-24 18:04:07 +00:00

16 commits