-
Notifications
You must be signed in to change notification settings - Fork 93
Home
Welcome to the opensoc-streaming wiki!
CIF Data can be loaded into hbase using the utility class com.opensoc.dataloads.cif.HBaseTableLoad. This class is in the OpenSoc-Dataloads folder.
The class takes a directory name and table name as inputs.
java -cp OpenSOC-Topologies-0.3BETA-SNAPSHOT.jar com.opensoc.dataloads.cif.HBaseTableLoad directoryname hbaseTableName
The hbase configuration is loaded from hbase-site.xml file. There is a hbase-site.xml file within OpenSoc-Dataloads folder. You can override the hbase-site.xml by passing in a different hbase-site.xml file in the classpath. i.e. java -cp /etc/hbase/conf/hbase-site.xml:OpenSOC-Topologies-0.3BETA-SNAPSHOT.jar com.opensoc.dataloads.cif.HBaseTableLoad directoryname hbaseTableName
The class assumes the source files are in gz compressed and data is in json. As of now, domain, email and infrastructure data is being loaded. URL and malware data is not being loaded.
The following datasets are being loaded into the hbase table.
domain_botnet/ domain_fastflux/ domain_malware/ domain_phishing/ domain_spam/ domain_spamvertising/ domain_suspicious/ domain_whitelist/ email_phishing/ email_registrant/ email_spam/ email_spamvertising/ email_suspicious/ email_whitelist/ infrastructure_botnet/ infrastructure_fastflux/ infrastructure_malware/ infrastructure_phishing/ infrastructure_scan/ infrastructure_spam/ infrastructure_spamvertising/ infrastructure_suspicious/ infrastructure_warez/ infrastructure_whitelist/
The first part of the directory is the column family name, the second part is the column qualifier. For e.g. within domain_botnet, domain is the family name and botnet is the qualifier name.
The loader uses the value for the json field "address" as the hbase row key. The value stored is a simple boolean flag "Y".