tess4j-4.3.0对应 tesseract 4.0.0 linux so版本编译记录
 2019-03-25 15:56:22   498   0   

本文最后更新于天前,文中介绍内容及环境可能已不适用.请谨慎参考.

 

使用opencv, tess4j智能锁IMEI图片文字识别,本地测试没问题,ubuntu服务器上部署,发现tess4j初始化异常。

妥妥的so文件没有。

用snap安装下来的

sudo snap install --channel=edge tesseract

发现so是最新版的4.1.0版本的,与tess4j包不匹配。

 

 

https://github.com/nguyenq/tess4j/releases/tag/tess4j-4.3.0

(/ω\)

对应tesseract版本为4.0.0

找了半天,没有对应的linux版本,好吧

自己编译。

 

 

 

参考

https://github.com/tesseract-ocr/tesseract/wiki/

 

安装编译环境 ubuntu 16.04

 

步骤:

 

 

apt-get install automake g++ git libtool libleptonica-dev make pkg-config

 

git clone

git clone https://github.com/tesseract-ocr/tesseract.git --branch 4.0.0 --single-branch

发现leptonica没有,

configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.

安装 leptonica

https://github.com/DanBloomberg/leptonica/

 

git clone https://github.com/DanBloomberg/leptonica

 cd leptonica
    ./autogen.sh
    ./configure
    make
    sudo make install
tops xtractprotos '/usr/local/bin'
libtool: install: /usr/bin/install -c .libs/convertfilestopdf /usr/local/bin/convertfilestopdf
libtool: install: /usr/bin/install -c .libs/convertfilestops /usr/local/bin/convertfilestops
libtool: install: /usr/bin/install -c .libs/convertformat /usr/local/bin/convertformat
libtool: install: /usr/bin/install -c .libs/convertsegfilestopdf /usr/local/bin/convertsegfilestopdf
libtool: install: /usr/bin/install -c .libs/convertsegfilestops /usr/local/bin/convertsegfilestops
libtool: install: /usr/bin/install -c .libs/converttopdf /usr/local/bin/converttopdf
libtool: install: /usr/bin/install -c .libs/converttops /usr/local/bin/converttops
libtool: install: /usr/bin/install -c .libs/fileinfo /usr/local/bin/fileinfo
libtool: install: /usr/bin/install -c .libs/imagetops /usr/local/bin/imagetops
libtool: install: /usr/bin/install -c .libs/xtractprotos /usr/local/bin/xtractprotos

安装完成

so目录在

leptonica/src/.libs/目录 ,查看版本, liblept.so -> liblept.so.5.0.3 

 

继续tesseract

 

cd tesseract
    ./autogen.sh
    ./configure
    make
    sudo make install
    sudo ldconfig

 

编译完成,so目录在

tesseract/src/.libs/目录 ,查看版本,4.0.0

tesseract -v

tesseract 4.0.0
 leptonica-1.79.0
  zlib 1.2.11
 Found AVX2
 Found AVX
 Found SSE

分别拷贝使用

  liblept.so.5.0.3 -> liblept.so 

libtesseract.so.4.0.0.0 ->libtesseract.so

/tesseract/tessdata/tessconfigs

/tesseract/tessdata/configs

以及这里下载的

chi_sim.traineddata

 

项目结构如下:

 

代码使用

将生成的so文件放入项目 nativelib目录

 

修改pom文件,打包时把tess4j的libtesseract.so

复制到classes/linux-x86-64目录里面.

	<resource>
				<targetPath>${project.build.directory}/classes/linux-x86-64</targetPath>
				<directory>nativelib</directory>
				<!-- <filtering>true</filtering> -->
				<includes>
					<!-- opencv native lib -->
					<include>**/*.dll</include>
					<include>**/*.so</include>
					<include>tess4j/**</include>

				</includes>
			</resource>

 

 

问题:

发现竟然 tesseract 4.0 有bug。。。

C环境

https://github.com/tesseract-ocr/tesseract/issues/1670

 

 

根据这里的说明

https://github.com/nguyenq/tess4j/issues/106#issuecomment-437361950

试了下如下处理:设置c环境。

public interface CLibrary extends Library {
	        CLibrary INSTANCE = (CLibrary) Native.loadLibrary((Platform.isWindows() ? "msvcrt" : "c"), CLibrary.class);

	        int LC_CTYPE=0;
	        int LC_NUMERIC=1;
	        int LC_ALL=6;

	        // char *setlocale(int category, const char *locale);
	        String setlocale(int category, String locale);
	    }

//tesseract 4.0 bug
			 CLibrary.INSTANCE.setlocale(CLibrary.LC_ALL, "C");
		   
			id = instance.doOCR(bimage_id);

好家伙,这个错误解决了。

 

leptonica 基础依赖

识别继续报错。

Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error opening data file file:/home/test/t

貌似leptonica 又是依赖libgif libjpeg libpng libtiff zlib 各种基础包。。。(・ˍ・*)

 

妈蛋

执行了下leptonica/.configure 发现,各种基础包都没有。(。•ˇ‸ˇ•。)

checking for LIBPNG... no
checking for png_read_png in -lpng... no
checking png.h usability... no
checking png.h presence... no
checking for png.h... no
checking for JPEG... no
checking for jpeg_read_scanlines in -ljpeg... no
checking jpeglib.h usability... no
checking jpeglib.h presence... no
checking for jpeglib.h... no
checking for DGifOpenFileHandle in -lgif... no
checking gif_lib.h usability... no
checking gif_lib.h presence... no
checking for gif_lib.h... no
checking for LIBTIFF... no
checking for TIFFOpen in -ltiff... no
checking tiff.h usability... no
checking tiff.h presence... no
checking for tiff.h... no
checking for LIBWEBP... no
checking for WebPGetInfo in -lwebp... no
checking webp/encode.h usability... no
checking webp/encode.h presence... no
checking for webp/encode.h... no
checking for LIBJP2K... no

 

翻官网,翻google,发现如下.

   Leptonica is configured to handle image I/O using these external
   libraries: libjpeg, libtiff, libpng, libz, libwebp, libgif, libopenjp2

   These libraries are easy to obtain.  For example, using the
   Debian package manager:
       sudo apt-get install 
   where  = {libpng-dev, libjpeg62-turbo-dev, libtiff5-dev,
                      libwebp-dev, libopenjp2-7-dev, libgif-dev

 

apt-get install -y libtiff5-dev   libpng16-dev

再执行./configure ,发现 jpeg, libtiff有了~~ 

checking for JPEG... yes
checking for DGifOpenFileHandle in -lgif... no
checking gif_lib.h usability... no
checking gif_lib.h presence... no
checking for gif_lib.h... no
checking for LIBTIFF... yes

 

重新make & make install

 

发现springboot jar包里面的训练数据无法加载.

Error opening data file file:/home/test/ttfp/ttfplat.jar!/BOOT-INF/classes!/linux-x86-64/tess4j/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'

 

解压到普通目录,再加载.

 

libpng库缺少

发现又少png的库,

在以下路径找到

/lib/x86_64-linux-gnu/libpng16.so

引入。

 

 

 

最后配置.

so都打入jar包。

启动时,自动释放so并加载。

自动解压tess4j 训练数据。

 

// opencv
		CopyAndLoadNative(nativename);

		if (!osName.toLowerCase(new Locale("", "", "")).startsWith("windows")) {
			// tess4j
			CopyAndLoadNative("libpng16.so");
			CopyAndLoadNative("liblept.so");
			CopyAndLoadNative("libtesseract.so");

		}

		try {
			/*JarFileHelper.loadRecourseFromJarByFolder("linux-x86-64/tess4j", System.getProperty("user.dir"),
					OpencvHelper.class);*/
			
			loadRecourseFromJarByFolder("linux-x86-64/tess4j", System.getProperty("user.dir"));
			
			System.out.println("tess4j 解压成功");

		} catch (Exception e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
			System.out.println("tess4j 解压失败");

		}

 

public static void loadRecourseFromJarByFolder(String inpath, String path) {

		String fname = OpencvHelper.class.getClassLoader().getResource(inpath).getFile();
		String fpath = OpencvHelper.class.getClassLoader().getResource(inpath).getPath();

		
		File filenative = new File(fname);

		if (filenative.isDirectory()) {
			
			
			for (File cfile : filenative.listFiles()) {
				
				try {
					loadRecourseFromJarByFolder(inpath+ "/"+cfile.getName(), path);	
				} catch (Exception e) {
					logger.error(e);
					continue;
				}
				
			}
		} else {
			InputStream nativeStream = OpencvHelper.class.getClassLoader().getResourceAsStream(inpath);

			String nativepath = path+  "/" +inpath;
			System.out.println("save to:" + nativepath);

			File foutput = new File(nativepath);
			File fold=new File(foutput.getParent());
			if (!fold.exists())
				fold.mkdirs();

			BufferedInputStream reader = null;
			FileOutputStream fo = null;
			try {

				fo = new FileOutputStream(foutput);

				reader = new BufferedInputStream(nativeStream);

				byte[] buffer = new byte[1024];

				while (reader.read(buffer) > 0) {

					fo.write(buffer);

					buffer = new byte[1024];
				}

				foutput.deleteOnExit();

			} catch (Exception e) {
				System.err.println(e.getMessage());
			} finally {
				try {
					if (nativeStream != null)

						nativeStream.close();
					if (fo != null)
						fo.close();
				} catch (Exception e) {

				}
			}

		}
	}

ok!


 2019-03-28 16:54:16 
 0

  本文基于CC BY-NC-ND 4.0 许可协议发布,作者:野生的喵喵 固定链接: 【tess4j-4.3.0对应 tesseract 4.0.0 linux so版本编译记录】 转载请注明



发表新的评论
{{s_uid}}   , 欢迎回来.
您的称呼(*必填):
您的邮箱地址(*必填,您的邮箱地址不会公开,仅作为有回复后的消息通知手段):
您的站点地址(选填):
留言:

∑( ° △ °|||)︴

(๑•̀ㅂ•́)و✧
<( ̄) ̄)>
[]~( ̄▽ ̄)~*
( ̄ˇ ̄)
[]~( ̄▽ ̄)~*
( ̄ˇ ̄)
╮( ̄▽ ̄)╭
( ̄ε(# ̄)
(⊙ˍ⊙)
( ̄▽ ̄)~*
∑( ° △ °|||)︴

文章分类

可能喜欢 

KxのBook@Copyright 2017- All Rights Reserved
Designed and themed by 野生的喵喵   1623489   44975