目前还有很多公司基于HDP来构建自己的大数据平台,随着Apache Kyuubi的持续热度,如何基于原有的HDP产品来集成Apache Kyuubi,很多人都迫切的需求。集成Apache Kyuubi到HDP中,主要涉及Ambari的二次开发。本文详细叙述了集成Apache Kyuubi的实践过程。

1、集成版本信息

服务版本
九尾1.4.1-孵化
安巴里2.7.3
高清图3.1.0
操作系统CentOS 7.4.1708

背景:基于HDP 3.1.0版本,我司完成了Apache版本的所有组件替换,然后集成了Apache Kyuubi。

集成Apache的主要组件版本信息如下

服务版本
高清文件系统3.3.0
3.3.0
MapReduce23.3.0
蜂巢3.1.2
火花3.1.1

2、集成步骤

自定义组件添加分为两大部分,一部分是需要将组件的可文件打包成RPM,另一部分是在Ambari 中添加组件的配置信息,启动脚本等。

2.1 制作RPM包

使用Ambari 安装或集成大数据组件时,需要将组件格式制作成 rpm 格式。

2.1.1 下载并解压Apache Kyuubi

下载地址:https://kyuubi.apache.org/releases.html

在本次集成中我们选择的是1.4.1-incubating版本。

执行tar zxf apache-kyuubi-1.4.1-incubating-bin.tgz,Kyuubi安装包结构简介

apache-kyuubi-1.4.1-incubating-bin├── DISCLAIMER├── LICENSE├── NOTICE├── RELEASE├── beeline-jars├── bin├── conf|   ├── kyuubi-defaults.conf.template│   ├── kyuubi-env.sh.template│   └── log4j2.properties.template├── docker│   ├── Dockerfile│   ├── helm│   ├── kyuubi-configmap.yaml│   ├── kyuubi-deployment.yaml│   ├── kyuubi-pod.yaml│   └── kyuubi-service.yaml├── externals│  └── engines├── jars├── licenses├── logs├── pid└── work

2.1.2 创建RPM包制作环境

安装 rpm-build 包

yum install rpm-build

安装 rpmdevtools

yum install rpmdevtools

创建工作空间

rpmdev-setuptree

安装工作空间简介

图片

2.1.3 制作RPM包

2.1.3.1 编辑Spec文件

制作rpm包需要用spec格式的文件。根据Kyuubi解压后的目录结构,需要自行编辑spec文件以包含所有需要的目录及文件。部分截图概要

%descriptionkyuubi%files%dir %attr(0755, root, root) "/usr/hdp/3.1.0.0-78/kyuubi"%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/DISCLAIMER"%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/LICENSE"%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/NOTICE"%attr(0644, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/RELEASE"%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/beeline-jars"%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/logs"%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/pid"%dir %attr(0777, root, root) "/usr/hdp/3.1.0.0-78/kyuubi/work"
%changelog

2.1.3.2 更换文件

创建目录

cd /root/rpmbuild/BUILDROOTmkdir -p /root/rpmbuild/BUILDROOT/kyuubi_3_1_0_0_78-2.3.2.3.1.0.0-78.x86_64/usr/hdp/3.1.0.0-78/kyuubi

转到apache-kyuubi-1.4.1-incubating-bin.tgz解压后目录

cp -r * /root/rpmbuild/BUILDROOT/kyuubi_3_1_0_0_78-2.3.2.3.1.0.0-78.x86_64/usr/hdp/3.1.0.0-78/kyuubi

将kyuubi.spec文件放到/root/rpmbuild/SPECS文件夹中

cp kyuubi.spec /root/rpmbuild/SPECS/

2.1.3.3 执行打包。

cd /root/rpmbuild/SPECSrpmbuild -ba kyuubi.spec

查看 rpm 包

打包好的rpm包在/root/rpmbuild/RPMS虚拟机

图片

2.1.4更新YUM源

将上步骤中生成rpm的目标yum源中对应目录,保持完成后执行更新操作

createrepo --update ./

2.2 Ambari集成Apache Kyuubi

2.2.1 公民服务目录结构

在Ambari中添加自定义服务,需要配置文件以Spark详细说明

图片

配置

该目录下存放的是spark的属性配置文件,对应Ambari页面的属性配置页面,可以设置默认值,类型,描述等信息。

包/脚本

该目录下存放服务操作相关的脚本,如服务启动,服务停止,服务检查等。

包/模板

该目录可存放,存放的是组件属性的配置信息,和配置目录下的配置对应,这个关系是如果我们在Ambari页面修改了属性信息,则修改信息会自动填充该目录下的文件的属性,所以,这个目录下的属性是最新的,并且是服务要调用

包裹/警报

该程序存放告急配置,例如程序断网、运行时告急

快速链接

该目录下存放的是快速链接配置,Ambari页面通过该配置可以跳转到我们想要跳转的页面。

指标.json

用来配置指标相关配置

kerberos.json

用来配置kerberos认证

元信息文件

这个文件很重要,主要是配置服务名称、服务类型、服务操作脚本、服务组件、指标以及快速链接等信息。

2.2.2 添加Kyuubi组件

在司的应用场景中,主要是用Kyuubi来替换Spark Thrift Server,依赖于Spark服务中集成Kyuubi组件。实现如下图所示的效果

图片

2.2.2.1 编辑metainfo.xml

在ambari-server/src/main/resources/stacks/HDP/3.0/services/SPARK目录中首先编辑metainfo.xml,为spark服务添加Kyuubi组件

<component>  <name>KYUUBI_SEVER</name>  <displayName>KYUUBI SEVER</displayName>  <category>SLAVE</category>  <cardinality>0+</cardinality>  <versionAdvertised>true</versionAdvertised>  <dependencies>    <dependency>      <name>HDFS/HDFS_CLIENT</name>      <scope>host</scope>      <auto-deploy>        <enabled>true</enabled>      </auto-deploy>    </dependency>    <dependency>      <name>MAPREDUCE2/MAPREDUCE2_CLIENT</name>      <scope>host</scope>      <auto-deploy>        <enabled>true</enabled>      </auto-deploy>    </dependency>    <dependency>      <name>YARN/YARN_CLIENT</name>      <scope>host</scope>      <auto-deploy>        <enabled>true</enabled>      </auto-deploy>    </dependency>    <dependency>      <name>SPARK/SPARK_CLIENT</name>      <scope>host</scope>      <auto-deploy>        <enabled>true</enabled>      </auto-deploy>    </dependency>    <dependency>      <name>HIVE/HIVE_METASTORE</name>      <scope>cluster</scope>      <auto-deploy>        <enabled>true</enabled>      </auto-deploy>    </dependency>  </dependencies>  <commandScript>    <script>scripts/kyuubi_server.py</script>    <scriptType>PYTHON</scriptType>    <timeout>600</timeout>   </commandScript>  <logs>    <log>      <logId>kyuubi_server</logId>      <primary>true</primary>    </log>  </logs></component>

其中kyuubi_server.py定义了安装、配置、启动、停止、获取服务状态等功能。

<configuration-dependencies>  <config-type>core-site</config-type>  <config-type>spark-defaults</config-type>  <config-type>spark-env</config-type>  <config-type>spark-log4j-properties</config-type>  <config-type>spark-metrics-properties</config-type>  <config-type>spark-thrift-sparkconf</config-type>  <config-type>spark-hive-site-override</config-type>  <config-type>spark-thrift-fairscheduler</config-type>  <config-type>kyuubi-defaults</config-type>  <config-type>kyuubi-env</config-type>  <config-type>kyuubi-log4j-properties</config-type>  <config-type>ranger-spark-audit</config-type>  <config-type>ranger-spark-security</config-type></configuration-dependencies>

configuration-dependencies ></ configuration-dependencies >标签内添加kyuubi相关配置项,包括kyuubi-defaults、kyuubi-env、kyuubi-log4j-properties、ranger-spark-audit、ranger-spark-security。

<osSpecific>  <osFamily>redhat7,amazonlinux2,redhat6,suse11,suse12</osFamily>  <packages>    <package>      <name>spark2_${stack_version}</name>    </package>    <package>      <name>spark2_${stack_version}-python</name>    </package>    <package>      <name>kyuubi_${stack_version}</name>    </package>  </packages></osSpecific>

package </ package >标签内添加kyuubi rpm包名称信息

2.2.2.2 编辑kyuubi_server.py文件

#!/usr/bin/env pythonimport osfrom resource_management import *
class KyuubiServer(Script):    def install(self, env):        self.install_packages(env)
    def configure(self, env, upgrade_type=None, config_dir=None):        import kyuubi_params        env.set_params(kyuubi_params)
        Directory([kyuubi_params.kyuubi_log_dir, kyuubi_params.kyuubi_pid_dir, kyuubi_params.kyuubi_metrics_dir, kyuubi_params.kyuubi_operation_log_dir],                  owner=kyuubi_params.kyuubi_user,                  group=kyuubi_params.kyuubi_group,                  mode=0775,                  create_parents = True                  )
        kyuubi_defaults = dict(kyuubi_params.config['configurations']['kyuubi-defaults'])
        PropertiesFile(format("{kyuubi_conf_dir}/kyuubi-defaults.conf"),               properties = kyuubi_defaults,               key_value_delimiter = " ",               owner=kyuubi_params.kyuubi_user,               group=kyuubi_params.kyuubi_group,               mode=0644               )
        # create kyuubi-env.sh in kyuubi install dir        File(os.path.join(kyuubi_params.kyuubi_conf_dir, 'kyuubi-env.sh'),             owner=kyuubi_params.kyuubi_user,             group=kyuubi_params.kyuubi_group,             content=InlineTemplate(kyuubi_params.kyuubi_env_sh),             mode=0644,        )
        #create log4j.properties kyuubi install dir        File(os.path.join(kyuubi_params.kyuubi_conf_dir, 'log4j.properties'),             owner=kyuubi_params.kyuubi_user,             group=kyuubi_params.kyuubi_group,             content=kyuubi_params.kyuubi_log4j_properties,             mode=0644,        )
    def start(self, env, upgrade_type=None):        import kyuubi_params        env.set_params(kyuubi_params)
        self.configure(env)        Execute(kyuubi_params.kyuubi_start_cmd,user=kyuubi_params.kyuubi_user,environment={'JAVA_HOME': kyuubi_params.java_home})
    def stop(self, env, upgrade_type=None):        import kyuubi_params        env.set_params(kyuubi_params)        self.configure(env)
        Execute(kyuubi_params.kyuubi_stop_cmd,user=kyuubi_params.kyuubi_user,environment={'JAVA_HOME': kyuubi_params.java_home})
    def status(self, env):        import kyuubi_params        env.set_params(kyuubi_params)        check_process_status(kyuubi_params.kyuubi_pid_file)
    def get_user(self):        import kyuubi_params        return kyuubi_params.kyuubi_user
    def get_pid_files(self):        import kyuubi_params        return [kyuubi_params.kyuubi_pid_file]
if __name__ == "__main__":    KyuubiServer().execute()

kyuubi_server.py定义了安装、配置、启动、停止Kyuubi服务的逻辑。

kyuubi_params配置变量以及需要的相关参数,由于篇幅原因这里不推荐使用。

2.2.2.3 编辑kyuubi_default.xml文件

<?xml version="1.0" encoding="UTF-8"?><configuration supports_final="true">    <property>        <name>kyuubi.ha.zookeeper.quorum</name>        <value>{{cluster_zookeeper_quorum}}</value>        <description>            The connection string for the zookeeper ensemble        </description>        <on-ambari-upgrade add="true"/>    </property>
    <property>        <name>kyuubi.frontend.thrift.binary.bind.port</name>        <value>10009</value>        <description>            Port of the machine on which to run the thrift frontend service via binary protocol.        </description>        <on-ambari-upgrade add="true"/>    </property>
    <property>        <name>kyuubi.ha.zookeeper.session.timeout</name>        <value>600000</value>        <description>            The timeout(ms) of a connected session to be idled        </description>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>kyuubi.session.engine.initialize.timeout</name>        <value>300000</value>        <description>            Timeout for starting the background engine, e.g. SparkSQLEngine.        </description>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>kyuubi.authentication</name>        <value>{{kyuubi_authentication}}</value>        <description>            Client authentication types        </description>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>spark.master</name>        <value>yarn</value>        <description>            The deploying mode of spark application.        </description>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>spark.submit.deployMode</name>        <value>cluster</value>        <description>spark submit deploy mode</description>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>spark.yarn.queue</name>        <value>default</value>        <description>            The name of the YARN queue to which the application is submitted.        </description>        <depends-on>            <property>                <type>capacity-scheduler</type>                <name>yarn.scheduler.capacity.root.queues</name>            </property>        </depends-on>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>spark.yarn.driver.memory</name>        <value>4g</value>        <description>spark yarn driver momory</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>spark.executor.memory</name>        <value>4g</value>        <description>spark.executor.memory</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>spark.sql.extensions</name>        <value>org.apache.submarine.spark.security.api.RangerSparkSQLExtension</value>        <description>spark sql ranger extension</description>        <on-ambari-upgrade add="false"/>    </property></configuration>

其中kyuubi.ha.zookeeper.quorum属性配置的值值为{{cluster_zookeeper_quorum}},会在kyuubi安装中自动替换为当前zk集群的信息。

kyuubi.authentication属性配置的值值为{{kyuubi_authentication}},在kyuubi安装中判断当前集群是否开启kerberos认证,来设置true或者false。

2.2.2.4 编辑kyuubi_env.xml文件

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration supports_adding_forbidden="true">    <property>        <name>kyuubi_user</name>        <display-name>Kyuubi User</display-name>        <value>spark</value>        <property-type>USER</property-type>        <value-attributes>            <type>user</type>            <overridable>false</overridable>            <user-groups>                <property>                    <type>cluster-env</type>                    <name>user_group</name>                </property>                <property>                    <type>kyuubi-env</type>                    <name>kyuubi_group</name>                </property>            </user-groups>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>kyuubi_group</name>        <display-name>Kyuubi Group</display-name>        <value>spark</value>        <property-type>GROUP</property-type>        <description>kyuubi group</description>        <value-attributes>            <type>user</type>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>kyuubi_log_dir</name>        <display-name>Kyuubi Log directory</display-name>        <value>/var/log/kyuubi</value>        <description>Kyuubi Log Dir</description>        <value-attributes>            <type>directory</type>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property>    <property>        <name>kyuubi_pid_dir</name>        <display-name>Kyuubi PID directory</display-name>        <value>/var/run/kyuubi</value>        <value-attributes>            <type>directory</type>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property>
    <!-- kyuubi-env.sh -->    <property>        <name>content</name>        <description>This is the jinja template for kyuubi-env.sh file</description>        <value>#!/usr/bin/env bash
export JAVA_HOME={{java_home}}export HADOOP_CONF_DIR=/etc/hadoop/confexport SPARK_HOME=/usr/hdp/current/spark-clientexport SPARK_CONF_DIR=/etc/spark/confexport KYUUBI_LOG_DIR={{kyuubi_log_dir}}export KYUUBI_PID_DIR={{kyuubi_pid_dir}}        </value>        <value-attributes>            <type>content</type>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property></configuration>

本文件主要设置JAVA_HOME、HADOOP_CONF_DIR、KYUUBI_LOG_DIR、KYUUBI_PID_DIR等相关路径信息

2.2.2.5 编辑kyuubi-log4j-properties.xml文件

<?xml version="1.0" encoding="UTF-8"?><configuration supports_final="false" supports_adding_forbidden="true">    <property>        <name>content</name>        <description>Kyuubi-log4j-Properties</description>        <value># Set everything to be logged to the consolelog4j.rootCategory=INFO, consolelog4j.appender.console=org.apache.log4j.ConsoleAppenderlog4j.appender.console.target=System.errlog4j.appender.console.layout=org.apache.log4j.PatternLayoutlog4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss.SSS} %p %c{2}: %m%n
# Set the default kyuubi-ctl log level to WARN. When running the kyuubi-ctl, the# log level for this class is used to overwrite the root logger's log level.log4j.logger.org.apache.kyuubi.ctl.ServiceControlCli=ERROR        </value>        <value-attributes>            <type>content</type>            <show-property-name>false</show-property-name>        </value-attributes>        <on-ambari-upgrade add="true"/>    </property></configuration>

2.2.2.6 编辑ranger-spark-security.xml文件

<?xml version="1.0"?><configuration>    <property>        <name>ranger.plugin.spark.service.name</name>        <value>{{repo_name}}</value>        <description>Name of the Ranger service containing policies for this SPARK instance</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>ranger.plugin.spark.policy.source.impl</name>        <value>org.apache.ranger.admin.client.RangerAdminRESTClient</value>        <description>Class to retrieve policies from the source</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>ranger.plugin.spark.policy.rest.url</name>        <value>{{policymgr_mgr_url}}</value>        <description>URL to Ranger Admin</description>        <on-ambari-upgrade add="false"/>        <depends-on>            <property>                <type>admin-properties</type>                <name>policymgr_external_url</name>            </property>        </depends-on>    </property>    <property>        <name>ranger.plugin.spark.policy.pollIntervalMs</name>        <value>30000</value>        <description>How often to poll for changes in policies?</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>ranger.plugin.spark.policy.cache.dir</name>        <value>/etc/ranger/{{repo_name}}/policycache</value>        <description>Directory where Ranger policies are cached after successful retrieval from the source</description>        <on-ambari-upgrade add="false"/>    </property></configuration>

本文件主要用来配置spark ranger相关参数。

2.2.2.7 编辑ranger-spark-audit.xml文件

<?xml version="1.0"?><configuration>    <property>        <name>xasecure.audit.is.enabled</name>        <value>true</value>        <description>Is Audit enabled?</description>        <value-attributes>            <type>boolean</type>        </value-attributes>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>xasecure.audit.destination.db</name>        <value>false</value>        <display-name>Audit to DB</display-name>        <description>Is Audit to DB enabled?</description>        <value-attributes>            <type>boolean</type>        </value-attributes>        <depends-on>            <property>                <type>ranger-env</type>                <name>xasecure.audit.destination.db</name>            </property>        </depends-on>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>xasecure.audit.destination.db.jdbc.driver</name>        <value>{{jdbc_driver}}</value>        <description>Audit DB JDBC Driver</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>xasecure.audit.destination.db.jdbc.url</name>        <value>{{audit_jdbc_url}}</value>        <description>Audit DB JDBC URL</description>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>xasecure.audit.destination.db.password</name>        <value>{{xa_audit_db_password}}</value>        <property-type>PASSWORD</property-type>        <description>Audit DB JDBC Password</description>        <value-attributes>            <type>password</type>        </value-attributes>        <on-ambari-upgrade add="false"/>    </property>    <property>        <name>xasecure.audit.destination.db.user</name>        <value>{{xa_audit_db_user}}</value>        <description>Audit DB JDBC User</description>        <on-ambari-upgrade add="false"/>    </property></configuration>

本文件主要用来配置spark ranger audit 相关参数。

2.2.2.8 编辑alerts.json文件

"KYUUBI_SEVER": [  {    "name": "kyuubi_server_status",    "label": "Kyuubi Server",    "description": "This host-level alert is triggered if the Kyuubi Server cannot be determined to be up.",    "interval": 1,    "scope": "ANY",    "source": {      "type": "SCRIPT",      "path": "DIF/3.0/services/SPARK/package/scripts/alerts/alert_kyuubi_server_port.py",      "parameters": [        {          "name": "check.command.timeout",          "display_name": "Command Timeout",          "value": 120.0,          "type": "NUMERIC",          "description": "The maximum time before check command will be killed by timeout",          "units": "seconds",          "threshold": "CRITICAL"        }      ]    }  }]

在alert.json文件中添加检测kyuubi服务器是否启动的告警检测配置,每隔120秒检测一次kyuubi服务器服务是否正常。检测逻辑由alert_kyuubi_server_port.py实现

2.2.2.9 编辑alert_kyuubi_server_port.py文件

alert_kyuubi_server_port.py的实现逻辑可以参考alert_spark_thrift_port.py,在此不具体实现逻辑,原理就是定时执行beeline连接操作判断网站上的连接成功。

2.2.2.10 编辑kerberos.json文件

{  "name": "kyuubi_service_keytab",  "principal": {    "value": "spark/_HOST@${realm}",    "type" : "service",    "configuration": "kyuubi-defaults/kyuubi.kinit.principal",    "local_username" : "${spark-env/spark_user}"  },  "keytab": {    "file": "${keytab_dir}/spark.service.keytab",    "owner": {      "name": "${spark-env/spark_user}",      "access": "r"    },    "group": {      "name": "${cluster-env/user_group}",      "access": ""    },    "configuration": "kyuubi-defaults/kyuubi.kinit.keytab"  }}

在kerberos.json文件中添加自动生成principal、keytab配置到kyuubi-defaults的逻辑。在集群启动kerberos认证的时候,会自动在kyuubi-defaults中添加kyuubi.kinit.keytab及kyuubi.kinit.principal配置项。

2.2.2.11 更新 ambari-server、ambari-agent RPM

将上述修改以及添加的文件内容更新到ambari-server、ambari-agent RPM包中的相应目录中。

对于已经安装的套件,可以通过如下操作进行:

1.卸载spark服务

2.将上述添加的文件放到如下目录的位置:

/var/lib/ambari-server/resources/stacks/HDP/3.0/services/SPARK/var/lib/ambari-agent/cache/stacks/DIF/3.0/services/SPARK

3. 在ambari-serer执行sudo ambari-server restart

4. 在ambari-agent所在节点执行sudo ambari-agent restart

5. 重新安装spark服务

3、效果展示

3.1 安装效果展示

安装spark时,支持选择kyuubi服务器组件

图片

安装过程中可以在界面上配置kyuubi参数

图片

3.2 安装成功后效果展示

安装成功

图片

Kyuubi相关配置页面展示成功

图片

图片

3.3 停掉Kyuubi Server效果展示

界面上停止掉kyuubi服务器

图片

界面展示停止成功

图片

界面告警提示kyuubi server停止

图片

重新启动服务器

图片

说警消失

图片

使用ambari-qa用户在后台通过beeline连接,可以连接成功

图片

在yarn界面上看到用户ambari-qa应用程序运行成功

图片

引用文献:

Ambari集成Apache Kyuubi实践

点赞(0) 打赏

评论列表 共有 0 条评论

暂无评论

微信公众账号

微信扫一扫加关注

发表
评论
返回
顶部