{ "job": { "content": [ { "reader": { "name": "mysqlreader", "parameter": { "column": [ "id", "type_id", "type", "sale_type", "trademark", "company", "seating_capacity", "power_type", "charge_type", "category", "weight_kg", "warranty" ], "connection": [ { "jdbcUrl": [ "jdbc:mysql://hadoop102:3306/car_data" ], "table": [ "car_info" ] } ], "password": "000000", "splitPk": "", "username": "root" } }, "writer": { "name": "hdfswriter", "parameter": { "column": [ { "name": "id", "type": "string" }, { "name": "type_id", "type": "string" }, { "name": "type", "type": "string" }, { "name": "sale_type", "type": "string" }, { "name": "trademark", "type": "string" }, { "name": "company", "type": "string" }, { "name": "seating_capacity", "type": "bigint" }, { "name": "power_type", "type": "string" }, { "name": "charge_type", "type": "string" }, { "name": "category", "type": "string" }, { "name": "weight_kg", "type": "bigint" }, { "name": "warranty", "type": "string" } ], "hadoopConfig": { "dfs.nameservices": "mycluster", "dfs.namenode.rpc-address.mycluster.nn2": "hadoop103:8020", "dfs.namenode.rpc-address.mycluster.nn1": "hadoop102:8020", "dfs.client.failover.proxy.provider.mycluster": "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider", "dfs.ha.namenodes.mycluster": "nn1,nn2" }, "compress": "gzip", "defaultFS": "hdfs://mycluster", "fieldDelimiter": " ", "fileName": "car_info", "fileType": "text", "path": "${targetdir}", "writeMode": "append" } } } ], "setting": { "speed": { "channel": 1 } } } }
1.测试DataX执行
DataX导入数据时,需要目的地目录已经存在,因此我们在执行DataX任务之前,首先要创建导出目录:hadoop fs -mkdir -p /origin_data/car_info/2023-05-01
然后执行以下命令:bin/datax.py job/car_info.json -p"-Dtargetdir=/origin_data/car_info/2023-05-01"
执行完成后,查看HDFS上/origin_data/car_info/2023-05-01目录中有没有出现数据。
注意:(测试时写入:python路径写绝对路径:"path": "/origin_data/car_info/2023-05-01")