nixpkgs

alioth/nixpkgs

Fork 0

forked from mirrors/nixpkgs

Commit graph

Author	SHA1	Message	Date
Ryan Mulligan	d4b9752212	tesseract_4: 4.00.00alpha-git-20170410 -> 4.0.0 The 4.0.0 stable release is out. Changelog: https://github.com/tesseract-ocr/tesseract/wiki/4.0x-Changelog	2018-11-24 15:25:59 -08:00
Matthew Justin Bauer	2eacddf0dc	treewide: homepage URL fixes (#28475 ) * pgadmin: use https homepage * msn-pecan: move homepage to github google code is now unavailable * pidgin-latex: use https for homepage * pidgin-opensteamworks: use github for homepage google code is unavailable * putty: use https for homepage * ponylang: use https for homepage * picolisp: use https for homepage * phonon: use https for homepage * pugixml: use https for homepage * pioneer: use https for homepage * packer: use https for homepage * pokerth: usee https for homepage * procps-ng: use https for homepage * pycaml: use https for homepage * proot: move homepage to .github.io * pius: use https for homepage * pdfread: use https for homepage * postgresql: use https for homepage * ponysay: move homepage to new site * prometheus: use https for homepage * powerdns: use https for homepage * pm-utils: use https for homepage * patchelf: move homepage to https * tesseract: move homepage to github * quodlibet: move homepage from google code * jbrout: move homepage from google code * eiskaltdcpp: move homepage to github * nodejs: use https to homepage * nix: use https for homepage * pdf2djvu: move homepage from google code * game-music-emu: move homepage from google code * vacuum: move homepae from google code	2017-08-22 20:50:04 +02:00
aszlig	7b5263e1a6	tesseract: Package version 4.x from Git master Tesseract 4 has got a new long short-term memory neural networking based OCR engine which really helps a lot in terms of accuracy and our VM tests. I ran the new version across a bunch of different screenshots and comparing the results to the 3.x branch and it really makes a big difference, especially with various font rendering settings. The only downside of this is that version 4 hasn't been released yet and is in alpha state right now, but it will eventually get there and the only solutions that came into my mind sticking to version 3 were really sub-par: * Use several passes with different color negation on the screenshots. * Train Tesseract 3 specifically for screenshots. This is sub-par because we'd need to do it for Tesseract 4 from scratch again. * Change the test systems so that it specifically uses only OCR an font when displaying. I've actually tried this but this also isn't accurate enough with our default font rendering setup. * Turn off special font rendering settings for our tests. In conjunction with changing to an OCR font this might work but it won't catch all the cases, because applications might use their own font rendering. Given that version 4 is faster[1] when it comes to OCR detection and also the points just mentioned I think even using the alpha version just for tests isn't going to hurt anybody. [1]: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance Signed-off-by: aszlig <aszlig@redmoonstudios.org>	2017-04-11 03:21:46 +02:00

Author

SHA1

Message

Date

Ryan Mulligan

d4b9752212

tesseract_4: 4.00.00alpha-git-20170410 -> 4.0.0

The 4.0.0 stable release is out.

Changelog: https://github.com/tesseract-ocr/tesseract/wiki/4.0x-Changelog

2018-11-24 15:25:59 -08:00

Matthew Justin Bauer

2eacddf0dc

treewide: homepage URL fixes (#28475 )

* pgadmin: use https homepage

* msn-pecan: move homepage to github

google code is now unavailable

* pidgin-latex: use https for homepage

* pidgin-opensteamworks: use github for homepage

google code is unavailable

* putty: use https for homepage

* ponylang: use https for homepage

* picolisp: use https for homepage

* phonon: use https for homepage

* pugixml: use https for homepage

* pioneer: use https for homepage

* packer: use https for homepage

* pokerth: usee https for homepage

* procps-ng: use https for homepage

* pycaml: use https for homepage

* proot: move homepage to .github.io

* pius: use https for homepage

* pdfread: use https for homepage

* postgresql: use https for homepage

* ponysay: move homepage to new site

* prometheus: use https for homepage

* powerdns: use https for homepage

* pm-utils: use https for homepage

* patchelf: move homepage to https

* tesseract: move homepage to github

* quodlibet: move homepage from google code

* jbrout: move homepage from google code

* eiskaltdcpp: move homepage to github

* nodejs: use https to homepage

* nix: use https for homepage

* pdf2djvu: move homepage from google code

* game-music-emu: move homepage from google code

* vacuum: move homepae from google code

2017-08-22 20:50:04 +02:00

aszlig

7b5263e1a6

tesseract: Package version 4.x from Git master

Tesseract 4 has got a new long short-term memory neural networking based
OCR engine which really helps a lot in terms of accuracy and our VM
tests.

I ran the new version across a bunch of different screenshots and
comparing the results to the 3.x branch and it really makes a big
difference, especially with various font rendering settings.

The only downside of this is that version 4 hasn't been released yet and
is in alpha state right now, but it will eventually get there and the
only solutions that came into my mind sticking to version 3 were really
sub-par:

 * Use several passes with different color negation on the screenshots.
 * Train Tesseract 3 specifically for screenshots. This is sub-par
   because we'd need to do it for Tesseract 4 from scratch again.
 * Change the test systems so that it specifically uses *only* OCR an
   font when displaying. I've actually tried this but this also isn't
   accurate enough with our default font rendering setup.
 * Turn off special font rendering settings for our tests. In
   conjunction with changing to an OCR font this might work but it won't
   catch all the cases, because applications might use their own font
   rendering.

Given that version 4 is faster[1] when it comes to OCR detection and also
the points just mentioned I think even using the alpha version just for
tests isn't going to hurt anybody.

[1]: https://github.com/tesseract-ocr/tesseract/wiki/4.0-Accuracy-and-Performance

Signed-off-by: aszlig <aszlig@redmoonstudios.org>

2017-04-11 03:21:46 +02:00

3 commits