Uploaded image for project: 'Help-Desk'
  1. Help-Desk
  2. HELP-15764

[fiware-stackoverflow] How to match column between HDFS File(save through cygnus) to Hive table?

    Details

      Description

      Created question in FIWARE Q/A platform on 02-05-2019 at 10:05
      Please, ANSWER this question AT https://stackoverflow.com/questions/55948021/how-to-match-column-between-hdfs-filesave-through-cygnus-to-hive-table

      Question:
      How to match column between HDFS File(save through cygnus) to Hive table?

      Description:
      I want save some data in hive through fiware-cygnus,

      I set the conf file,

      cygnus-ngsi.sources = http-source
      cygnus-ngsi.sinks = hdfs-sink
      cygnus-ngsi.channels = hdfs-channel

      cygnus-ngsi.sources.http-source.channels = hdfs-channel
      cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
      cygnus-ngsi.sources.http-source.port = 5050
      cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
      cygnus-ngsi.sources.http-source.handler.notification_target = /notify
      cygnus-ngsi.sources.http-source.handler.default_service = def_serv
      cygnus-ngsi.sources.http-source.handler.default_service_path = /def_servpath
      cygnus-ngsi.sources.http-source.handler.events_ttl = 2
      cygnus-ngsi.sources.http-source.interceptors = ts gi
      cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
      cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
      cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /Applications/apache-flume-1.4.0-bin/conf/grouping_rules.confi

      cygnus-ngsi.channels.hdfs-channel.type = memory
      cygnus-ngsi.channels.hdfs-channel.capacity = 1000
      cygnus-ngsi.channels.hdfs-channel.transactionCapacity = 100

      cygnus-ngsi.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.NGSIHDFSSink
      cygnus-ngsi.sinks.hdfs-sink.channel = hdfs-channel
      cygnus-ngsi.sinks.hdfs-sink.enable_encoding = true
      cygnus-ngsi.sinks.hdfs-sink.enable_grouping = false
      cygnus-ngsi.sinks.hdfs-sink.enable_lowercase = false
      cygnus-ngsi.sinks.hdfs-sink.enable_name_mappings = false
      cygnus-ngsi.sinks.hdfs-sink.data_model = dm-by-entity
      cygnus-ngsi.sinks.hdfs-sink.file_format = json-row
      cygnus-ngsi.sinks.hdfs-sink.backend.impl = rest
      cygnus-ngsi.sinks.hdfs-sink.backend.max_conns = 500
      cygnus-ngsi.sinks.hdfs-sink.backend.max_conns_per_route = 100
      cygnus-ngsi.sinks.hdfs-sink.hdfs_host = [hdfs_host]
      cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070
      cygnus-ngsi.sinks.hdfs-sink.hdfs_username = [username]
      cygnus-ngsi.sinks.hdfs-sink.hdfs_password = [password]
      cygnus-ngsi.sinks.hdfs-sink.oauth2_token = [token]
      cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false
      cygnus-ngsi.sinks.hdfs-sink.batch_size = 100
      cygnus-ngsi.sinks.hdfs-sink.batch_timeout = 30
      cygnus-ngsi.sinks.hdfs-sink.batch_ttl = 10
      cygnus-ngsi.sinks.hdfs-sink.batch_retry_intervals = 5000
      cygnus-ngsi.sinks.hdfs-sink.hive = true
      cygnus-ngsi.sinks.hdfs-sink.hive.server_version = 2
      cygnus-ngsi.sinks.hdfs-sink.hive.host = [hive_host]
      cygnus-ngsi.sinks.hdfs-sink.hive.port = 10000
      cygnus-ngsi.sinks.hdfs-sink.hive.db_type = default-db
      cygnus-ngsi.sinks.hdfs-sink.krb5_auth = false

      and I found HDFS file created,

      {"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]} {"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]} {"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]} {"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]}

      but, In hive, all data show null values.

      hive> select * from ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row;
      chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/ae8cd691-6e96-428d-8ece-02f6e6fbf527/hive_2019-05-02_16-37-44_821_8122085527913292696-1/mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w---
      OK
      ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054imex0054s ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054ime ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.fiwarex0053ervicex0050ath ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0049d ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0054ype ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004eame ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0054ype ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0056alue ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004dd
      NULL NULL NULL NULL NULL NULL NULL NULL NULL
      NULL NULL NULL NULL NULL NULL NULL NULL NULL
      NULL NULL NULL NULL NULL NULL NULL NULL NULL
      NULL NULL NULL NULL NULL NULL NULL NULL NULL
      Time taken: 0.26 seconds, Fetched: 4 row(s)

      External Table creating query in Cygnus Debug Console Message is,

      create external table
      if not exists default.ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row

      (recvx0054imex0054s bigint,
      recvx0054ime string,
      fiwarex0053ervicex0050ath string,
      entityx0049d string,
      entityx0054ype string,
      attrx004eame string,
      attrx0054ype string,
      attrx0056alue string,
      attrx004dd array<struct<name:string,type:string,value:string>>)

      row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true")

      location '/user/ngsi/Light/LightInfo/bedroom_light_Light'

      I guess the difference between hive table column name and HDFS File Attribute Name is the problem.
      Hive table column name used encoding name, but hdfs file attribute name use not encoding name.(ex: recvTimeTs - recvx0054imex0054s)

      so I created external table and tested.

      create external table if not exists default.test01 (recvx0054imex0054s bigint, recvx0054ime string, fiwarex0053ervicex0050ath string, entityx0049d string, entityx0054ype string, attrx004eame string, attrx0054ype string, attrx0056alue string,attrx004dd array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true") location '/user/ngsi/Light/LightInfo/bedroom_light_Light'

      Result, show right values.

      hive> select * from test01;
      chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/bec8d67d-7d94-4cdc-a577-14d990ca342c/hive_2019-05-02_16-42-59_290_896718721577759467-1/mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w---
      OK
      1556776186 2019-05-02T05:49:46.771Z /LightInfo bedroom3_light Light lux Float 7 []
      1556776186 2019-05-02T05:49:46.771Z /LightInfo bedroom3_light Light status string off []
      1556776989 2019-05-02T06:03:09.962Z /LightInfo bedroom3_light Light lux Float 7 []
      1556776989 2019-05-02T06:03:09.962Z /LightInfo bedroom3_light Light status string off []
      Time taken: 2.329 seconds, Fetched: 4 row(s)

      How to match between Hive Table Column name and HDFS File Attribute name?

      or How to use cygnus and hive other way?

        Activity

        Hide
        backlogmanager Backlog Manager added a comment -

        2019-05-02 12:06|CREATED monitor | # answers= 0, accepted answer= False

        Show
        backlogmanager Backlog Manager added a comment - 2019-05-02 12:06|CREATED monitor | # answers= 0, accepted answer= False

          People

          • Assignee:
            fermin Fermín Galán
            Reporter:
            backlogmanager Backlog Manager
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: