1.下载
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
本次下载:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
解压到/home/hadoop/sqoop
拷贝sqoop-1.4.7.jar到/home/hadoop/sqoop/lib目录下
2.配置conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop
export HIVE_HOME=/home/hadoop/hive
3.配置环境变量~/.bashrc
验证sqoop
sqoop version
4.执行命名报错,主要是确实一些tar
sqoop/lib
sqoop-1.4.7.jar
mysql-connector-java-5.1.39-bin.jar
commons-lang3-3.3.2.jar
avro-tools-1.8.1.jar
下载地址:https://maven.aliyun.com/mvn/search
5.将mysql数据库中的表数据导入到HDFS指定目录
bin/sqoop import \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
--username root \
--password root \
--target-dir /sqoop/post \
--table post --m 1
6.导入表数据子集
sqoop import \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
--username root \
--password root \
--where "user_id>1" \
--target-dir /sqoop/post2 \
--table post \
--m 1
7.增量导入
sqoop import \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
--username root \
--password root \
--target-dir /sqoop/post \
--table post \
--m 1 \
--incremental append \
--check-column id \
--last-value 9
hadoop fs -cat /sqoop/post/part-m-*
8.hdfs导出数据到mysql
sqoop export \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test?characterEncoding=UTF-8 \
--username root \
--password root \
--table student \
--m 1 \
--export-dir /user/hive/warehouse/jstudent
characterEncoding=UTF-8解决导入到mysql的数据中文乱码问题。
9.作业(job)环境配置
配置mysql为metastore
修改配置文件sqoop-site.xml,以下配置默认是被注释掉的,现在将它都打开:
<property>
<name>sqoop.metastore.client.autoconnect.url</name>
<value>jdbc:mysql://192.168.15.45:3307/sqoop?createDatabaseIfNotExist=true</value>
<description>The connect string to use when connecting to a
job-management metastore. If unspecified, uses ~/.sqoop/.
You can specify a different path here.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.username</name>
<value>root</value>
<description>The username to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.autoconnect.password</name>
<value>root</value>
<description>The password to bind to the metastore.
</description>
</property>
<property>
<name>sqoop.metastore.client.record.password</name>
<value>true</value>
<description>If true, allow saved passwords in the metastore.
</description>
</property>
<property>
<name>sqoop.jobbase.serialize.sqoopoptions</name>
<value>true</value>
<description>If true, then all options will be serialized into job.xml
</description>
</property>
<property>
<name>sqoop.metastore.server.location</name>
<value>/tmp/sqoop-metastore/shared.db</value>
<description>Path to the shared metastore database files.
If this is not set, it will be placed in ~/.sqoop/.
</description>
</property>
<property>
<name>sqoop.metastore.server.port</name>
<value>16000</value>
<description>Port that this metastore should listen on.
</description>
</property>
10.启动metastore
sqoop metastore &
11.运行sqoop job --list 报错
java.io.IOException: Invalid metadata version
解决办法:
进入mysql的sqoop数据库中,自动生成了SQOOP_ROOT表
在表中插入:
INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0');
再次运行就不会报错
12.创建job过程又报错,缺少jar包
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-beanutils-1.7.0.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-collections-3.2.2.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-lang-2.6.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-logging-1.2.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/json-20070829.jar /home/hadoop/sqoop/lib/
13.创建import的job作业
sqoop job --create myjob1 \
-- import \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
--username root \
--password root \
--target-dir /sqoop/zhilian \
--table zhiliancookies \
--m 1
14.执行作业
sqoop job --exec myjob1
15.代码定制
sqoop codegen --connect jdbc:mysql://192.168.15.45:3307/a_spider_test --username root --password root --table post