使用opencv, tess4j智能锁IMEI图片文字识别,本地测试没问题,ubuntu服务器上部署,发现tess4j初始化异常。
妥妥的so文件没有。
用snap安装下来的
sudo snap install --channel=edge tesseract
发现so是最新版的4.1.0版本的,与tess4j包不匹配。
https://github.com/nguyenq/tess4j/releases/tag/tess4j-4.3.0
(/ω\)
对应tesseract版本为4.0.0
找了半天,没有对应的linux版本,好吧
自己编译。
参考
https://github.com/tesseract-ocr/tesseract/wiki/
安装编译环境 ubuntu 16.04
步骤:
apt-get install automake g++ git libtool libleptonica-dev make pkg-config
git clone
git clone https://github.com/tesseract-ocr/tesseract.git --branch 4.0.0 --single-branch
发现leptonica没有,
configure: error: Leptonica 1.74 or higher is required. Try to install libleptonica-dev package.
安装 leptonica
https://github.com/DanBloomberg/leptonica/
git clone https://github.com/DanBloomberg/leptonica
cd leptonica
./autogen.sh
./configure
make
sudo make install
tops xtractprotos '/usr/local/bin'
libtool: install: /usr/bin/install -c .libs/convertfilestopdf /usr/local/bin/convertfilestopdf
libtool: install: /usr/bin/install -c .libs/convertfilestops /usr/local/bin/convertfilestops
libtool: install: /usr/bin/install -c .libs/convertformat /usr/local/bin/convertformat
libtool: install: /usr/bin/install -c .libs/convertsegfilestopdf /usr/local/bin/convertsegfilestopdf
libtool: install: /usr/bin/install -c .libs/convertsegfilestops /usr/local/bin/convertsegfilestops
libtool: install: /usr/bin/install -c .libs/converttopdf /usr/local/bin/converttopdf
libtool: install: /usr/bin/install -c .libs/converttops /usr/local/bin/converttops
libtool: install: /usr/bin/install -c .libs/fileinfo /usr/local/bin/fileinfo
libtool: install: /usr/bin/install -c .libs/imagetops /usr/local/bin/imagetops
libtool: install: /usr/bin/install -c .libs/xtractprotos /usr/local/bin/xtractprotos
安装完成
so目录在
leptonica/src/.libs/目录 ,查看版本, liblept.so -> liblept.so.5.0.3
继续tesseract
cd tesseract
./autogen.sh
./configure
make
sudo make install
sudo ldconfig
编译完成,so目录在
tesseract/src/.libs/目录 ,查看版本,4.0.0
tesseract -v
tesseract 4.0.0
leptonica-1.79.0
zlib 1.2.11
Found AVX2
Found AVX
Found SSE
分别拷贝使用
liblept.so.5.0.3 -> liblept.so
libtesseract.so.4.0.0.0 ->libtesseract.so
/tesseract/tessdata/tessconfigs
/tesseract/tessdata/configs
以及这里下载的
chi_sim.traineddata
项目结构如下:
代码使用
将生成的so文件放入项目 nativelib目录
修改pom文件,打包时把tess4j的libtesseract.so
复制到classes/linux-x86-64目录里面.
<resource>
<targetPath>${project.build.directory}/classes/linux-x86-64</targetPath>
<directory>nativelib</directory>
<!-- <filtering>true</filtering> -->
<includes>
<!-- opencv native lib -->
<include>**/*.dll</include>
<include>**/*.so</include>
<include>tess4j/**</include>
</includes>
</resource>
问题:
发现竟然 tesseract 4.0 有bug。。。
C环境
https://github.com/tesseract-ocr/tesseract/issues/1670
根据这里的说明
https://github.com/nguyenq/tess4j/issues/106#issuecomment-437361950
试了下如下处理:设置c环境。
public interface CLibrary extends Library {
CLibrary INSTANCE = (CLibrary) Native.loadLibrary((Platform.isWindows() ? "msvcrt" : "c"), CLibrary.class);
int LC_CTYPE=0;
int LC_NUMERIC=1;
int LC_ALL=6;
// char *setlocale(int category, const char *locale);
String setlocale(int category, String locale);
}
//tesseract 4.0 bug
CLibrary.INSTANCE.setlocale(CLibrary.LC_ALL, "C");
id = instance.doOCR(bimage_id);
好家伙,这个错误解决了。
leptonica 基础依赖
识别继续报错。
Error in pixReadMemTiff: function not present
Error in pixReadMem: tiff: no pix returned
Error in pixaGenerateFontFromString: pix not made
Error in bmfCreate: font pixa not made
Error opening data file file:/home/test/t
貌似leptonica 又是依赖libgif libjpeg libpng libtiff zlib 各种基础包。。。(・ˍ・*)
妈蛋
执行了下leptonica/.configure 发现,各种基础包都没有。(。•ˇ‸ˇ•。)
checking for LIBPNG... no
checking for png_read_png in -lpng... no
checking png.h usability... no
checking png.h presence... no
checking for png.h... no
checking for JPEG... no
checking for jpeg_read_scanlines in -ljpeg... no
checking jpeglib.h usability... no
checking jpeglib.h presence... no
checking for jpeglib.h... no
checking for DGifOpenFileHandle in -lgif... no
checking gif_lib.h usability... no
checking gif_lib.h presence... no
checking for gif_lib.h... no
checking for LIBTIFF... no
checking for TIFFOpen in -ltiff... no
checking tiff.h usability... no
checking tiff.h presence... no
checking for tiff.h... no
checking for LIBWEBP... no
checking for WebPGetInfo in -lwebp... no
checking webp/encode.h usability... no
checking webp/encode.h presence... no
checking for webp/encode.h... no
checking for LIBJP2K... no
翻官网,翻google,发现如下.
Leptonica is configured to handle image I/O using these external
libraries: libjpeg, libtiff, libpng, libz, libwebp, libgif, libopenjp2These libraries are easy to obtain. For example, using the
Debian package manager:
sudo apt-get install
where = {libpng-dev, libjpeg62-turbo-dev, libtiff5-dev,
libwebp-dev, libopenjp2-7-dev, libgif-dev
apt-get install -y libtiff5-dev libpng16-dev
再执行./configure ,发现 jpeg, libtiff有了~~
checking for JPEG... yes
checking for DGifOpenFileHandle in -lgif... no
checking gif_lib.h usability... no
checking gif_lib.h presence... no
checking for gif_lib.h... no
checking for LIBTIFF... yes
重新make & make install
发现springboot jar包里面的训练数据无法加载.
Error opening data file file:/home/test/ttfp/ttfplat.jar!/BOOT-INF/classes!/linux-x86-64/tess4j/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
解压到普通目录,再加载.
libpng库缺少
发现又少png的库,
在以下路径找到
/lib/x86_64-linux-gnu/libpng16.so
引入。
最后配置.
so都打入jar包。
启动时,自动释放so并加载。
自动解压tess4j 训练数据。
// opencv
CopyAndLoadNative(nativename);
if (!osName.toLowerCase(new Locale("", "", "")).startsWith("windows")) {
// tess4j
CopyAndLoadNative("libpng16.so");
CopyAndLoadNative("liblept.so");
CopyAndLoadNative("libtesseract.so");
}
try {
/*JarFileHelper.loadRecourseFromJarByFolder("linux-x86-64/tess4j", System.getProperty("user.dir"),
OpencvHelper.class);*/
loadRecourseFromJarByFolder("linux-x86-64/tess4j", System.getProperty("user.dir"));
System.out.println("tess4j 解压成功");
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.out.println("tess4j 解压失败");
}
public static void loadRecourseFromJarByFolder(String inpath, String path) {
String fname = OpencvHelper.class.getClassLoader().getResource(inpath).getFile();
String fpath = OpencvHelper.class.getClassLoader().getResource(inpath).getPath();
File filenative = new File(fname);
if (filenative.isDirectory()) {
for (File cfile : filenative.listFiles()) {
try {
loadRecourseFromJarByFolder(inpath+ "/"+cfile.getName(), path);
} catch (Exception e) {
logger.error(e);
continue;
}
}
} else {
InputStream nativeStream = OpencvHelper.class.getClassLoader().getResourceAsStream(inpath);
String nativepath = path+ "/" +inpath;
System.out.println("save to:" + nativepath);
File foutput = new File(nativepath);
File fold=new File(foutput.getParent());
if (!fold.exists())
fold.mkdirs();
BufferedInputStream reader = null;
FileOutputStream fo = null;
try {
fo = new FileOutputStream(foutput);
reader = new BufferedInputStream(nativeStream);
byte[] buffer = new byte[1024];
while (reader.read(buffer) > 0) {
fo.write(buffer);
buffer = new byte[1024];
}
foutput.deleteOnExit();
} catch (Exception e) {
System.err.println(e.getMessage());
} finally {
try {
if (nativeStream != null)
nativeStream.close();
if (fo != null)
fo.close();
} catch (Exception e) {
}
}
}
}
ok!
本文基于CC BY-NC-ND 4.0 许可协议发布,作者:野生的喵喵。 固定链接: 【tess4j-4.3.0对应 tesseract 4.0.0 linux so版本编译记录】 转载请注明
相关文章: