Skip to content

Commit 0b40c40

Browse files
committed
move hbase files
1 parent 5ee6236 commit 0b40c40

31 files changed

+6936
-6908
lines changed

articles/hdinsight/hadoop/apache-hadoop-introduction.md

+188-188
Large diffs are not rendered by default.

articles/hdinsight/hadoop/hdinsight-use-hive.md

+242-242
Large diffs are not rendered by default.

articles/hdinsight/hbase/apache-hbase-build-java-maven-linux.md

+689
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
---
2+
title: What is HBase in Azure HDInsight? | Microsoft Docs
3+
description: An introduction to Apache HBase in HDInsight, a NoSQL database build on Hadoop. Learn about use cases and compare HBase to other Hadoop clusters.
4+
keywords: bigtable,nosql,what is hbase,apache hbase,hbase,habase overview,
5+
services: hdinsight
6+
documentationcenter: ''
7+
tags: azure-portal
8+
author: mumian
9+
manager: jhubbard
10+
editor: cgronlun
11+
12+
ms.assetid: d2a76d53-133a-4849-a30c-88d9c794391c
13+
ms.service: hdinsight
14+
ms.custom: hdinsightactive,hdiseo17may2017
15+
ms.workload: big-data
16+
ms.tgt_pltfrm: na
17+
ms.devlang: na
18+
ms.topic: get-started-article
19+
ms.date: 07/17/2017
20+
ms.author: jgao
21+
22+
---
23+
# What is HBase in HDInsight: A NoSQL database that provides BigTable-like capabilities for Hadoop
24+
Apache HBase is an open-source, NoSQL database that is built on Hadoop and modeled after Google BigTable. HBase provides random access and strong consistency for large amounts of unstructured and semistructured data in a schemaless database organized by column families.
25+
26+
Data is stored in the rows of a table, and data within a row is grouped by column family. HBase is a schemaless database in the sense that neither the columns nor the type of data stored in them need to be defined before using them. The open-source code scales linearly to handle petabytes of data on thousands of nodes. It can rely on data redundancy, batch processing, and other features that are provided by distributed applications in the Hadoop ecosystem.
27+
28+
## How is HBase implemented in Azure HDInsight?
29+
HDInsight HBase is offered as a managed cluster that is integrated into the Azure environment. The clusters are configured to store data directly in [Azure Storage](./../hdinsight-hadoop-use-blob-storage.md) or [Azure Data Lake Store](./../hdinsight-hadoop-use-data-lake-store.md), which provides low latency and increased elasticity in performance and cost choices. This enables customers to build interactive websites that work with large datasets, to build services that store sensor and telemetry data from millions of end points, and to analyze this data with Hadoop jobs. HBase and Hadoop are good starting points for big data project in Azure; in particular, they can enable real-time applications to work with large datasets.
30+
31+
The HDInsight implementation leverages the scale-out architecture of HBase to provide automatic sharding of tables, strong consistency for reads and writes, and automatic failover. Performance is enhanced by in-memory caching for reads and high-throughput streaming for writes. HBase cluster can be created inside virtual network. For details, see [Create HDInsight clusters on Azure Virtual Network][hbase-provision-vnet].
32+
33+
## How is data managed in HDInsight HBase?
34+
Data can be managed in HBase by using the `create`, `get`, `put`, and `scan` commands from the HBase shell. Data is written to the database by using `put` and read by using `get`. The `scan` command is used to obtain data from multiple rows in a table. Data can also be managed using the HBase C# API, which provides a client library on top of the HBase REST API. An HBase database can also be queried by using Hive. For an introduction to these programming models, see [Get started using HBase with Hadoop in HDInsight][hbase-get-started]. Co-processors are also available, which allow data processing in the nodes that host the database.
35+
36+
> [!NOTE]
37+
> Thrift is not supported by HBase in HDInsight.
38+
>
39+
40+
## Scenarios: Use cases for HBase
41+
The canonical use case for which BigTable (and by extension, HBase) was created was web search. Search engines build indexes that map terms to the web pages that contain them. But there are many other use cases that HBase is suitable for—several of which are itemized in this section.
42+
43+
* Key-value store
44+
45+
HBase can be used as a key-value store, and it is suitable for managing message systems. Facebook uses HBase for their messaging system, and it is ideal for storing and managing Internet communications. WebTable uses HBase to search for and manage tables that are extracted from webpages.
46+
* Sensor data
47+
48+
HBase is useful for capturing data that is collected incrementally from various sources. This includes social analytics, time series, keeping interactive dashboards up-to-date with trends and counters, and managing audit log systems. Examples include Bloomberg trader terminal and the Open Time Series Database (OpenTSDB), which stores and provides access to metrics collected about the health of server systems.
49+
* Real-time query
50+
51+
[Phoenix](http://phoenix.apache.org/) is a SQL query engine for Apache HBase. It is accessed as a JDBC driver, and it enables querying and managing HBase tables by using SQL.
52+
* HBase as a platform
53+
54+
Applications can run on top of HBase by using it as a datastore. Examples include Phoenix, OpenTSDB, Kiji, and Titan. Applications can also integrate with HBase. Examples include Hive, Pig, Solr, Storm, Flume, Impala, Spark, Ganglia, and Drill.
55+
56+
## <a name="next-steps"></a>Next steps
57+
* [Get started using HBase with Hadoop in HDInsight][hbase-get-started]
58+
* [Create HDInsight clusters on Azure Virtual Network][hbase-provision-vnet]
59+
* [Configure HBase replication in HDInsight](apache-hbase-replication.md)
60+
* [Use Maven to build Java applications that use HBase with HDInsight (Hadoop)][hbase-build-java-maven]
61+
62+
## <a name="see-also"></a>See also
63+
* [Apache HBase](https://hbase.apache.org/)
64+
* [Bigtable: A Distributed Storage System for Structured Data](http://research.google.com/archive/bigtable.html)
65+
66+
[hbase-provision-vnet]:apache-hbase-provision-vnet.md
67+
68+
[hbase-build-java-maven]: hdinsight-hbase-build-java-maven.md
69+
70+
[hdinsight-use-hive]:../hdinsight-use-hive.md
71+
72+
[hdinsight-storage]: ../../hdinsight-hadoop-use-blob-storage.md
73+
74+
[hbase-get-started]: http://azure.microsoft.com/documentation/articles/hdinsight-hbase-get-started/
75+
76+
[azure-purchase-options]: http://azure.microsoft.com/pricing/purchase-options/
77+
[azure-member-offers]: http://azure.microsoft.com/pricing/member-offers/
78+
[azure-free-trial]: http://azure.microsoft.com/pricing/free-trial/
79+
[azure-management-portal]: https://portal.azure.com/
80+
[azure-create-storageaccount]:../../storage/common/storage-create-storage-account.md
81+
82+
[apache-hadoop]: http://hadoop.apache.org/
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: Use Apache Phoenix and SQLLine with HBase in Azure HDInsight | Microsoft Docs
3+
description: Learn how to use Apache Phoenix in HDInsight. Also, learn how to install and set up SQLLine on your computer to connect to an HBase cluster in HDInsight.
4+
services: hdinsight
5+
documentationcenter: ''
6+
author: mumian
7+
manager: jhubbard
8+
editor: cgronlun
9+
10+
ms.assetid: cda0f33b-a2e8-494c-972f-ae0bb482b818
11+
ms.service: hdinsight
12+
ms.custom: hdinsightactive
13+
ms.devlang: na
14+
ms.topic: article
15+
ms.tgt_pltfrm: na
16+
ms.workload: big-data
17+
ms.date: 09/22/2017
18+
ms.author: jgao
19+
20+
---
21+
# Use Apache Phoenix with Linux-based HBase clusters in HDInsight
22+
Learn how to use [Apache Phoenix](http://phoenix.apache.org/) in Azure HDInsight, and how to use SQLLine. For more information about Phoenix, see [Phoenix in 15 minutes or less](http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html). For the Phoenix grammar, see [Phoenix grammar](http://phoenix.apache.org/language/index.html).
23+
24+
> [!NOTE]
25+
> For Phoenix version information about HDInsight, see [What's new in the Hadoop cluster versions provided by HDInsight](../hdinsight-component-versioning.md).
26+
>
27+
>
28+
29+
## Use SQLLine
30+
[SQLLine](http://sqlline.sourceforge.net/) is a command-line utility to execute SQL.
31+
32+
### Prerequisites
33+
Before you can use SQLLine, you must have the following items:
34+
35+
* **An HBase cluster in HDInsight**. To create one, see [Get started with Apache HBase in HDInsight](./apache-hbase-tutorial-get-started.md).
36+
37+
When you connect to an HBase cluster, you need to connect to one of the ZooKeeper VMs. Each HDInsight cluster has three ZooKeeper VMs.
38+
39+
**To get the ZooKeeper host name**
40+
41+
1. Open Ambari by browsing to **https://\<cluster name\>.azurehdinsight.net**.
42+
2. To sign in, enter the HTTP (cluster) user name and password.
43+
3. In the left menu, select **ZooKeeper**. Three **ZooKeeper Server** instances are listed.
44+
4. Select one of the **ZooKeeper Server** instances. On the **Summary** pane, find the **Hostname**. It looks similar to *zk1-jdolehb.3lnng4rcvp5uzokyktxs4a5dhd.bx.internal.cloudapp.net*.
45+
46+
**To use SQLLine**
47+
48+
1. Connect to the cluster by using SSH. For more information, see [Use SSH with HDInsight](../hdinsight-hadoop-linux-use-ssh-unix.md).
49+
50+
2. In SSH, use the following commands to run SQLLine:
51+
52+
cd /usr/hdp/2.2.9.1-7/phoenix/bin
53+
./sqlline.py <ZOOKEEPER SERVER FQDN>:2181:/hbase-unsecure
54+
3. To create an HBase table, and insert some data, run the following commands:
55+
56+
CREATE TABLE Company (COMPANY_ID INTEGER PRIMARY KEY, NAME VARCHAR(225));
57+
58+
!tables
59+
60+
UPSERT INTO Company VALUES(1, 'Microsoft');
61+
62+
SELECT * FROM Company;
63+
64+
!quit
65+
66+
For more information, see the [SQLLine manual](http://sqlline.sourceforge.net/#manual) and [Phoenix grammar](http://phoenix.apache.org/language/index.html).
67+
68+
## Next steps
69+
In this article, you learned how to use Apache Phoenix in HDInsight. To learn more, see these articles:
70+
71+
* [HDInsight HBase overview][hdinsight-hbase-overview].
72+
HBase is an Apache, open-source, NoSQL database built on Hadoop that provides random access and strong consistency for large amounts of unstructured and semistructured data.
73+
* [Provision HBase clusters on Azure Virtual Network][hdinsight-hbase-provision-vnet].
74+
With virtual network integration, HBase clusters can be deployed to the same virtual network as your applications, so applications can communicate directly with HBase.
75+
* [Configure HBase replication in HDInsight](apache-hbase-replication.md). Learn how to set up HBase replication across two Azure datacenters.
76+
77+
78+
[azure-portal]: https://portal.azure.com
79+
[vnet-point-to-site-connectivity]: https://msdn.microsoft.com/library/azure/09926218-92ab-4f43-aa99-83ab4d355555#BKMK_VNETPT
80+
81+
[hdinsight-manage-portal]: hdinsight-administer-use-management-portal.md#connect-to-clusters-using-rdp
82+
[hdinsight-hbase-provision-vnet]:apache-hbase-provision-vnet.md
83+
[hdinsight-hbase-overview]:apache-hbase-overview.md
84+
85+
[hdinsight-hbase-phoenix-sqlline]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-phoenix-sqlline.png
86+
[img-certificate]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-vpn-certificate.png
87+
[img-vnet-diagram]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-vnet-point-to-site.png
88+
[img-squirrel-driver]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-squirrel-driver.png
89+
[img-squirrel-alias]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-squirrel-alias.png
90+
[img-squirrel]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-squirrel.png
91+
[img-squirrel-sql]: ./media/hdinsight-hbase-phoenix-squirrel/hdinsight-hbase-squirrel-sql.png

0 commit comments

Comments
 (0)