TokenStream contract violation: reset()/close() call missing, clojure
准备将一段Java代码转换为Clojure代码, 是处理关键字高亮的
public static void highLightClojure(int id, String text, String field) { try { Query queryToSearch; queryToSearch = new QueryParser("asddf", analyzer).parse("read text file string utf8"); Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(), new QueryScorer(queryToSearch)); TokenStream tokenStream = TokenSources.getTokenStream( field,text, analyzer); TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 4); for (int j = 0; j < frag.length; j++) { if ((frag[j] != null)) { System.out.println("score: " + frag[j].getScore() + ", frag: " + (frag[j].toString())); } } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (InvalidTokenOffsetsException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } }
Eclipse中用的是Lucene 6.0.0, Clojure的用的是4.0
(defn foo [field text] (let [ analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.) tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer) query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8") highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query)) frag (.getBestTextFragments highlighter tokenStream text false 4) ] frag ) )
结果报错
IllegalStateException TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow. org.apache.lucene.analysis.Tokenizer$1.read (Tokenizer.java:109)
但是代码和Java是一模一样的, 第一怀疑是Lucene版本的问题, 将Clojure中的Lucene升级到6.0.0, 结果还是一样, 难道是Clojure的问题, 再将Eclipse打包为jar, 加载到Clojure里面, 用Clojure直接调用Java中的函数, 结果正常.和Clojure也无关. 这就奇怪了.
搜索没有任何发现. 因为这里更不不涉及reset close的问题.
最后再仔细的对比了一下Java code 和 Clojure code, 发现唯一的区别是调用顺序, 原来这些API的使用是和调用顺序有关的.
(defn foo [field text] (let [ analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.) query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8") tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer) highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query)) frag (.getBestTextFragments highlighter tokenStream text false 4) ] frag ) )
这里重点是tokenStream 必须在query的后面实例化, 下面的也是可以的
(defn foo [field text] (let [ analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.) query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8") highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query)) tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer) frag (.getBestTextFragments highlighter tokenStream text false 4) ] frag ) )
这可以算是一个bug了, 因为在调用这方面看不出明显的先后关系, 这是一个隐藏的依赖.