TokenStream contract violation: reset()/close() call missing, clojure

准备将一段Java代码转换为Clojure代码, 是处理关键字高亮的

 
 
    public static void highLightClojure(int id, String text, String field) {
        try {
            Query queryToSearch;
            queryToSearch = new QueryParser("asddf", analyzer).parse("read text file string utf8");
            Highlighter highlighter = new Highlighter(new SimpleHTMLFormatter(),
                    new QueryScorer(queryToSearch));
            TokenStream tokenStream = TokenSources.getTokenStream( field,text, analyzer);
            TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 4);
            for (int j = 0; j < frag.length; j++) {
                if ((frag[j] != null)) {
                    System.out.println("score: " + frag[j].getScore() + ", frag: " + (frag[j].toString()));
                }
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (InvalidTokenOffsetsException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (ParseException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
 

Eclipse中用的是Lucene 6.0.0, Clojure的用的是4.0

 
(defn foo [field text]
  (let [
        analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.)
        tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer)
        query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8") 
        highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query))
        frag (.getBestTextFragments highlighter tokenStream text false 4)
  ]
    frag
  )  
)
 

结果报错

 
IllegalStateException TokenStream contract violation: reset()/close() call missing, reset() called multiple times, or subclass does not call super.reset(). Please see Javadocs of TokenStream class for more information about the correct consuming workflow.  org.apache.lucene.analysis.Tokenizer$1.read (Tokenizer.java:109)
 

但是代码和Java是一模一样的, 第一怀疑是Lucene版本的问题, 将Clojure中的Lucene升级到6.0.0, 结果还是一样, 难道是Clojure的问题, 再将Eclipse打包为jar, 加载到Clojure里面, 用Clojure直接调用Java中的函数, 结果正常.和Clojure也无关. 这就奇怪了.

搜索没有任何发现. 因为这里更不不涉及reset close的问题.

最后再仔细的对比了一下Java code 和 Clojure code, 发现唯一的区别是调用顺序, 原来这些API的使用是和调用顺序有关的.

 
(defn foo [field text]
  (let [
        analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.) 
        query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8")
       tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer)        
        highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query))
 
        frag (.getBestTextFragments highlighter tokenStream text false 4)
       ]
    frag
  )  
)
 

这里重点是tokenStream 必须在query的后面实例化, 下面的也是可以的

 
(defn foo [field text]
  (let [
        analyzer (org.apache.lucene.analysis.standard.StandardAnalyzer.) 
        query (.parse (org.apache.lucene.queryparser.classic.QueryParser. "" analyzer) "read text file string utf8")
 
        highlighter (org.apache.lucene.search.highlight.Highlighter. (org.apache.lucene.search.highlight.SimpleHTMLFormatter.) (org.apache.lucene.search.highlight.QueryScorer. query))
       tokenStream (org.apache.lucene.search.highlight.TokenSources/getTokenStream field text analyzer)        
        frag (.getBestTextFragments highlighter tokenStream text false 4)
       ]
    frag
  )  
)
 

这可以算是一个bug了, 因为在调用这方面看不出明显的先后关系, 这是一个隐藏的依赖.