Solr(搜索引擎服务)和MongoDB通过mongodb-connector进行数据同步的解决方案,以及遇到的各种坑的总结(针对solr-5.3.x版本),mongodb和solr实现实时增量索引...

    xiaoxiao2023-09-01  168

    Solr配置与MongoDB的安装

         Solr安装配置到目前已经非常简单,参考官方文档:http://lucene.apache.org/solr/quickstart.html,官方文档中用的是cloud这个样例(-e 指定),最后,我采用的是techproducts,基本命令如下:

    注意:如果unzip没有安装,请先安装:apt-get install unzip

     

    root@xxx:xxx# ls solr-*

    solr-5.3.1.zip  solr-5.3.1.zip

    root@xxx:xxx# unzip -q solr-5.3.1.zip

    root@xxx:xxx# cd solr-5.3.1/

    root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts –noprompt

    Your current version of Java is too old to run this version of Solr

    We found version 1.7.0_79, using command '/usr/java/jdk1.7.0_79/bin/java'

    Please install latest version of Java 8 or set JAVA_HOME properly.

     

    Debug information:

    JAVA_HOME: /usr/java/jdk1.7.0_79

    Active Path:

    /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/java/jdk1.7.0_79/bin:/usr/local/mongodb/bin

     

    如果出现上面的提示语,请安装jdk1.7以上版本

     

    安装好之后执行:

    root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts -noprompt

     

    这时候从浏览器访问:

    http://localhost:8983/solr/#/techproducts     的结果是:

    上面的techproducts就是一个SolrCore

    1、Solr的安装和启动停止

    按照官方文档所说,如果你像用完后关闭solr,并清除这个样例底下的数据,那么请运行:

    root@xxx:/home/software/solr-5.3.1# pwd

    /home/software/solr-5.3.1

    root@xxx:/home/software/solr-5.3.1# bin/solr stop -all

    Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 1816 to stop gracefully.

     

    root@xxx:/home/software/solr-5.3.1# rm -Rf example/techproducts/

     

    注意:如果在停止所有之后执行:bin/solr start -all -noprompt   默认到,然后访问:http://localhost:8983/solr/   ,添加solrCore,它最后会到/home/software/solr-5.3.1/server/solr中去找。若没有,拷贝:/home/software/solr-5.3.1/example/techproducts/techproducts    到   /home/software/solr-5.3.1/server/solr   并将名称techproducts改成docdetection,例如:

    并修改docdetection中的core.properties的内容为:

    #Written by CorePropertiesLocator#Fri Mar 31 18:36:50 UTC 2017name=docdetectionconfig=solrconfig.xmlschema=schema.xmldataDir=data

    如果想创建多个,可以在docdetection同级目录下创建多个。比如:

    core.properties的内容如下:

    #Written by CorePropertiesLocator#Fri Mar 31 18:36:50 UTC 2017name=testconfig=solrconfig.xmlschema=schema.xmldataDir=data

    进入/home/software/solr-5.3.1/server/solr-webapp/webapp/WEB-INF,修改web.xml中的

    <env-entry>       <env-entry-name>solr/home</env-entry-name>       <env-entry-value>/home/software/solr-5.3.1/server/solr</env-entry-value>       <env-entry-type>java.lang.String</env-entry-type>    </env-entry>

    接着刷新:http://localhost:8983/solr,最终的界面如下:

    2、solr与MongoDB的整合

    从Solr官方给的quickstart文档上来看,它可以搜索xml,json, csv等多种文档,但丝毫看不出这东西还能跟MongoDB整合,但是万能的人类总是能想办法把他们弄到一起,或许真的有全能神吧。

        参考地址:http://www.cnblogs.com/sysuys/p/3403670.html

     

        为了让solr和mongodb进行整合,需要mongo-connector,参考地址是:https://github.com/10gen-labs/mongo-connector/wiki/Getting-Started

       

        关于mongo-connector的下载地址:https://github.com/mongodb-labs/mongo-connector

       

    1) 、建立MongoDB的replica set(副本集)

    安装python-pip 和 git

    root@xxx:~# apt-get install python-pip

    root@xxx:~# apt-get install git

    Reading package lists... Done

    Building dependency tree      

    Reading state information... Done

    The following extra packages will be installed:

    root@iZm5effj2tm01xy2qqmnlzZ:~#

     

    指定副本集合启动:

    mongod –replSet docdetection   在本次实验中,我把他放到后台运行了,参考地址:http://blog.csdn.net/tototuzuoquan/article/details/55805811

     

    MongoDB的终止和启动很简单,你要是上面启动的,它就在前台运行,你需要再次结束时,直接Ctrl + C,如果启动时加上&,它就在后台运行,当然也就得用pkill或者kill了。

     

    然后再mongo shell下对副本集进行初始化:

    root@xxx:/etc# mongo

    MongoDB shell version: 3.2.11

    connecting to: test

    Server has startup warnings:

    > rs.initiate();

    {

             "info2" : "no configuration specified. Using a default configuration for the set",

             "me" : "iZm5effj2tm01xy2qqmnlzZ:27017",

             "ok" : 1

    }

    docdetection:OTHER>

    这个时候MongoDB这一边就弄好了,很简单,就要加一个副本集。

    2)、安装mongo-connector

    2.1)、mongo-connector安装(推荐)

    安装参考https://github.com/10gen-labs/mongo-connector,十分简单,一条命令:

    可以在安装的时候,让mongo-connector作为一个后台进程,可以按照下面的步骤进行安装:

    编辑config.json进行查看

    root@xxx:/home/software/mongo-connector-master#pwd

    /home/software/mongo-connector-master

     

    root@xxx:/home/software/mongo-connector-master# pip install mongo_connector[solr]

    要注意的是,在后面同步solr的时候,要doc manager,所以,也需要对它进行安装,如果够仅仅是按照上面的方式安装,同步的时候会出现错误,在网上查了很久,最后直接在mongodb-connector中的README.rst找到了一个地址:https://github.com/mongodb-labs/solr-doc-manager    (同理安装其它类型的doc-manger也类似)

     

    如果没有安装:solr-doc-manager,请执行下面的命令:pip install solr-doc-manager

     

    如果查找mongo-connector在哪儿,可以使用下面的方式:

    root@xxx:/etc/init.d# find / -name mongo-connector

    /home/software/mongo-connector-master/scripts/mongo-connector

    /usr/local/bin/mongo-connector

    root@xxx:/etc/init.d#

     

    下面是安装elastic2-doc-manager 这个doc-manager

    root@xxx:/home/software/mongo-connector-master# pip install elastic2-doc-manager

     

    注意:如果提示没有python-pip,apt-get一下便好了。但是先别急着用,因为这个东西要读取solr的配置文件,所以Solr中的一些地方弄好了,再用这个就只是一条命令罢了。

     

    注意:网上说通过pip安装,但是没有说卸载的,看下pip的说明:

    root@xxx:/home/software/mongo-connector-master# pip  --help

     

    Usage:  

      pip <command> [options]

    可以通过下面的方式进行卸载:

    root@xxx:/home/software/mongo-connector-master# pip uninstall mongo-connector

    2.2)、第二种安装mongodb-connector的方式:

    git clone https://github.com/10gen-labs/mongo-connector.git

    cd mongo-connector

    #安装前修改mongo_connector/constants.py的变量:设置DEFAULT_COMMIT_INTERVAL = 0

    python setup.py install

     

    2.3)、第三种方式:https://github.com/mongodb-labs/mongo-connector上下载mongo-connector-master.zip   (不推荐)

    root@xxx:/home/software# unzip mongo-connector-master.zip

    root@xxx:/home/software# chmod +x setup.py

    root@xxx:/home/software# cd mongo-connector-master/

    root@xxx:/home/software/mongo-connector-master# python setup.py install

    root@xxx:/home/software/mongo-connector-master# python setup.py install_service

    running install_service

    creating /var/log/mongo-connector

    copying ./config.json -> /etc/mongo-connector.json

    copying ./scripts/mongo-connector -> /etc/init.d

     

    root@xxx:/home/software/mongo-connector-master# chmod +x /etc/init.d/mongo-connector

    执行下面的命令确保系统的启动配置被更新了:

    root@xxx:/home/software/mongo-connector-master# update-rc.d mongo-connector defaults

    update-rc.d: warning: default start runlevel arguments (2 3 4 5) do not match mongo-connector Default-Start values (3 4 5)

     Adding system startup for /etc/init.d/mongo-connector ...

       /etc/rc0.d/K20mongo-connector -> ../init.d/mongo-connector

       /etc/rc1.d/K20mongo-connector -> ../init.d/mongo-connector

       /etc/rc6.d/K20mongo-connector -> ../init.d/mongo-connector

       /etc/rc2.d/S20mongo-connector -> ../init.d/mongo-connector

       /etc/rc3.d/S20mongo-connector -> ../init.d/mongo-connector

       /etc/rc4.d/S20mongo-connector -> ../init.d/mongo-connector

       /etc/rc5.d/S20mongo-connector -> ../init.d/mongo-connector

    root@iZm5effj2tm01xy2qqmnlzZ:/home/software/mongo-connector-master#

     

    如果想移除后台运行的可以执行下面的操作:

    python setup.py uninstall_service

     

    通过这个命令可以移除/etc/init.d/mongo-connector /etc/mongo-connector.json

     

    3)Solr一端的配置:

    查找schema.xml,并修改这个文件

    root@xxx:/home/software/solr-5.3.1# find ./ -name "schema.xml"

    ./example/example-DIH/solr/rss/conf/schema.xml

    ./example/example-DIH/solr/tika/conf/schema.xml

    ./example/example-DIH/solr/solr/conf/schema.xml

    ./example/example-DIH/solr/mail/conf/schema.xml

    ./example/example-DIH/solr/db/conf/schema.xml

    ./example/techproducts/solr/techproducts/conf/schema.xml

    ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml

    ./server/solr/configsets/basic_configs/conf/schema.xml

    root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1#

     

    打开

    vi ./server/solr/configsets/sample_techproducts_configs/conf/schema.xml

    将(linux上的查找方式是: Esc --à    :/<uniqueKey>)

    <uniqueKey>id</uniqueKey>

    改成带有下划线的id:

    再添加(Linux上到达最第行的命令:  Esc --à  Shift + g):

    <field name="_id" type="string" indexed="true" stored="true" />

    <field name="_ts" type="long" indexed="true" stored="true" />

    <field name="ns" type="string" indexed="true" stored="true"/>

     

    添加后的效果如下:

     

    注释掉原来的(命令是:  Esc --à :/name="id"

    <!-- 

    <field name="id"type="string" indexed="true" stored="true"required="true" multiValued="false" />

    -->

    截图如下:

    不然往Solr中添加一个json,或者xml都要求有这个字段id,因为required=”true”

    schema.xml的修改就是这些

     

    修改solrconfig.xml

    打开:

    vi ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml

    将(关于下面的class如果配置错了,也将出不来solr和mongodb的数据同步,参考官网:https://github.com/mongodb-labs/mongo-connector/wiki/Usage with Solr#make-sure-the-lukerequesthandler-is-enabled)

    <requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />

     

    解注释,如果没有,就添加一行,这个东西要被mongo-connector用到,mongo-connector会请求获取上面的schema.xml,正是这个Handler来处理这个请求,所以说这个很重要。

     

    最后:

        最后,我们按照之前说的关闭Solr,清除example/techproducts目录,重新再次启动Solr,重启techproducts样例会产生一些错误,那是因为修改了schema.xml,里面uniqueKey变成了_id,而不是id,所以会产生这些错误,但这些都可以忽略,不产生错误就说明有问题。之后你会发现,那两个配置文件被复制成了exmaple/techproducts这个样例的配置文件,就像上文说的。

    root@xxx:/home/software/solr-5.3.1# cd /home/software/solr-5.3.1

    root@xxx:/home/software/solr-5.3.1# bin/solr stop -all

    Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 1816 to stop gracefully.

     

    root@xxx:/home/software/solr-5.3.1#

    root@xxx:/home/software/solr-5.3.1# bin/solr start -e techproducts -noprompt

     

    4)使用mongo-connector连接Solr与MongoDB.

    在目前情况下,请运行(其中:mongo-connector 的参考地址是:http://blog.csdn.net/hyman_yx/article/details/51684218):

    mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/techproducts -d solr_doc_manager &

    mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &

    注意:

    若有时候发现重新创建索引的时候不给力,需要执行下面的命令(同时要删除索引,重新创建):

    root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1# rm -rf mongo-connector.log root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1# rm -rf oplog.timestamp 

    root@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1/server/solr/docdetection/data# pwd/home/software/solr-5.3.1/server/solr/docdetection/dataroot@iZm5effj2tm01xy2qqmnlzZ:/home/software/solr-5.3.1/server/solr/docdetection/data# rm -rf *

    执行完成之后的效果如下:

    查看mongo-connector进去的内容

    经过以上步骤配置之后,终于可以看到(至此,配置成功):

    在MongoDB中的内容为:

    如果有时候你发现你的solr没有自动同步数据,那是因为solr默认配置中,默认把自动同步给关闭了,这时候需要对solrconfig.xml自动同步的开关进行设置,可以以下操作

    进入solr的目录(注意:我的solr是放在/home/software/solr-5.3.1)中的:

    cd /home/software/solr-5.3.1

    查找solrconfig.xml

    find ./ -name solrconfig.xml,结果如下:

    ./example/files/conf/solrconfig.xml

    ./example/example-DIH/solr/rss/conf/solrconfig.xml

    ./example/example-DIH/solr/tika/conf/solrconfig.xml

    ./example/example-DIH/solr/solr/conf/solrconfig.xml

    ./example/example-DIH/solr/mail/conf/solrconfig.xml

    ./example/example-DIH/solr/db/conf/solrconfig.xml

    ./example/techproducts/solr/techproducts/conf/solrconfig.xml

    ./server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml

    ./server/solr/configsets/basic_configs/conf/solrconfig.xml

    ./server/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

     

    修改上面红色标注出来的文件中的如下内容进行修改:

    <autoCommit>

       <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

       <openSearcher>false</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

    <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>

    </autoSoftCommit>

     

    修改1:vimexample/techproducts/solr/techproducts/conf/solrconfig.xml

    <autoCommit>

    <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

        <openSearcher>false</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

     

    <autoSoftCommit>

           <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>

    </autoSoftCommit>

     

    修改为:

    <autoCommit>

    <maxTime>300000</maxTime>

        <maxDocs>10000</maxDocs>

        <openSearcher>true</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

        <maxDocs>1000</maxDocs>

        <maxTime>60000</maxTime>

    </autoSoftCommit>

     

    修改2:/server/solr/configsets/sample_techproducts_configs/conf/solrconfig.xml

    <autoCommit>

        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

        <openSearcher>false</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

     

    <autoSoftCommit>

        <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>

    </autoSoftCommit>

     

    修改为:

    <autoCommit>

        <maxTime>300000</maxTime>

        <maxDocs>10000</maxDocs>

        <openSearcher>true</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

        <maxDocs>1000</maxDocs>

        <maxTime>60000</maxTime>

    </autoSoftCommit>

     

    修改3:vim server/solr/configsets/basic_configs/conf/solrconfig.xml

    <autoCommit>

         <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

         <openSearcher>false</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

         'soft' commit which only ensures that changes are visible

         but does not ensure that data is synced to disk.  This is

         faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

         <maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>

    </autoSoftCommit>

     

    修改为:

    <autoCommit>

        <maxTime>300000</maxTime>

        <maxDocs>10000</maxDocs>

        <openSearcher>true</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

        <maxDocs>1000</maxDocs>

        <maxTime>60000</maxTime>

    </autoSoftCommit>

     

    修改4:vimserver/solr/configsets/data_driven_schema_configs/conf/solrconfig.xml

    <autoCommit>

        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>

    <openSearcher>false</openSearcher>

    </autoCommit>

     

    修改为:

    <autoCommit>

        <maxTime>300000</maxTime>

    <maxDocs>10000</maxDocs>

        <openSearcher>true</openSearcher>

    </autoCommit>

     

    <!-- softAutoCommit is like autoCommit except it causes a

             'soft' commit which only ensures that changes are visible

             but does not ensure that data is synced to disk.  This is

             faster and more near-realtime friendly than a hard commit.

    -->

    <autoSoftCommit>

    <maxDocs>1000</maxDocs>

        <maxTime>60000</maxTime>

    </autoSoftCommit>

    要注意的是,如果想在solr中再次添加mongodb中中的key作为索引元素,需要编辑solrCore中的schema.xml中的内容。下面的一个例子是:

    <?xml version="1.0" encoding="UTF-8" ?>

    <schema name="example" version="1.5"> 

     <types>     

      <fieldType name="string" class="solr.StrField" sortMissingLast="true" />  

    <fieldType name="boolean" class="solr.BoolField" sortMissingLast="true"/>  

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0" positionIncrementGap="0"/>      

    <fieldType name="float" class="solr.TrieFloatField" precisionStep="0" positionIncrementGap="0"/>      

    <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/>     

    <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" positionIncrementGap="0"/>  

    <fieldType name="date" class="solr.TrieDateField" precisionStep="0" positionIncrementGap="0"/>  

    <!--  

            <fieldType name="text_ik" class="solr.TextField">       

    <analyzer class="org.wltea.analyzer.lucene.IKAnalyzer"/>      

    </fieldType>  

    -->    

    <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">  

    <analyzer type="index">

    <tokenizer class="solr.StandardTokenizerFactory"/>

    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

    <!-- in this example, we will only use synonyms at query time

    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>

    -->

    <filter class="solr.LowerCaseFilterFactory"/>  

      </analyzer>  

    <analyzer type="query">

    <tokenizer class="solr.StandardTokenizerFactory"/>

    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />

    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>

    <filter class="solr.LowerCaseFilterFactory"/>  

    </analyzer>      

    </fieldType>  

    </types>    

    <fields>      

    <field name="_version_" type="long" indexed="true" stored="true"/>      

    <field name="_id" type="string" indexed="true" stored="true" />      

    <field name="_ts" type="long" indexed="true" stored="true" />     

    <field name="ns" type="string" indexed="true" stored="true"/>   

    <field name="docLibrayId" type="string" indexed="true" stored="true"/>      

    <field name="originalDocPath" type="string" indexed="true" stored="true"/>      

    <field name="htmlDocPath" type="string" indexed="true" stored="true" />      

    <field name="originalFileName" type="string" indexed="true" stored="true"/>      

    <field name="majorId" type="string" indexed="true" stored="true"/>   

    <field name="majorName" type="string" indexed="true" stored="true"/>      

    <field name="propertyId" type="string" indexed="true" stored="true"/>      

    <field name="propertyName" type="string" indexed="true" stored="true"/>      

    <field name="wordNum" type="int" indexed="true" stored="true"/>      

    <field name="paragNum" type="int" indexed="true" stored="true"/>      

    <field name="sentenceNum" type="int" indexed="true" stored="true"/>  

    <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/>  

    </fields>  

    <uniqueKey>_id</uniqueKey>  

    <defaultSearchField>majorName</defaultSearchField>  

    <solrQueryParser defaultOperator="OR"/>

    </schema>

    回到执行mongodb-connector命令的所在位置:

    mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &

    找到:oplog.timestamp,然后删除。同样,也可以删除mongo-connector.log这个文件

    进入索引的存放目录:

    cd /home/software/solr-5.3.1/server/solr/docdetection/data

    删除生成的所有的索引信息rm -rf *    (注意目录在:cd /home/software/solr-5.3.1/server/solr/docdetection/data)

    然后再执行:

    重启solr,命令在博文的上面:

    mongo-connector -m localhost:27017 --auto-commit-interval=0 -t http://localhost:8983/solr/docdetection -d solr_doc_manager &

    最新回复(0)