Skip to content

Commit 57df4ab

Browse files
committed
Lines 26-55 (alpha sort)
1 parent 8e3c3d6 commit 57df4ab

20 files changed

+191
-201
lines changed

articles/hdinsight/hdinsight-apache-spark-jupyter-notebook-install-locally.md

+17-16
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ You must install Python before you can install Jupyter notebooks. Both Python a
4040

4141
1. Download the [Anaconda installer](https://www.continuum.io/downloads) for your platform and run the setup. While running the setup wizard, make sure you select the option to add Anaconda to your PATH variable.
4242
2. Run the following command to install Jupyter.
43-
43+
4444
conda install jupyter
45-
45+
4646
For more information on installting Jupyter, see [Installing Jupyter using Anaconda](http://jupyter.readthedocs.io/en/latest/install.html).
4747

4848
## Install the kernels and Spark magic
@@ -56,19 +56,19 @@ For clusters v3.5, please install sparkmagic 0.8.4 by executing `pip install spa
5656
In this section you configure the Spark magic that you installed earlier to connect to an Apache Spark cluster that you must have already created in Azure HDInsight.
5757

5858
1. The Jupyter configuration information is typically stored in the users home directory. To locate your home directory on any OS platform, type the following commands.
59-
59+
6060
Start the Python shell. On a command window, type the following:
61-
61+
6262
python
63-
63+
6464
On the Python shell, enter the following command to find out the home directory.
65-
65+
6666
import os
6767
print(os.path.expanduser('~'))
6868

6969
2. Navigate to the home directory and create a folder called **.sparkmagic** if it does not already exist.
7070
3. Within the folder, create a file called **config.json** and add the following JSON snippet inside it.
71-
71+
7272
{
7373
"kernel_python_credentials" : {
7474
"username": "{USERNAME}",
@@ -83,7 +83,7 @@ In this section you configure the Spark magic that you installed earlier to conn
8383
}
8484

8585
4. Substitute **{USERNAME}**, **{CLUSTERDNSNAME}**, and **{BASE64ENCODEDPASSWORD}** with appropriate values. You can use a number of utilities in your favorite programming language or online to generate a base64 encoded password for your actualy password. A simple Python snippet to run from your command prompt would be:
86-
86+
8787
python -c "import base64; print(base64.b64encode('{YOURPASSWORD}'))"
8888

8989
5. Configure the right Heartbeat settings in `config.json`:
@@ -100,16 +100,17 @@ In this section you configure the Spark magic that you installed earlier to conn
100100
"livy_server_heartbeat_timeout_seconds": 60,
101101
"heartbeat_retry_seconds": 1
102102

103-
>[!TIP] Heartbeats are sent to ensure that sessions are not leaked. Note that when a computer goes to sleep or is shut down, the hearbeat will not be sent, resulting in the session being cleaned up. For clusters v3.4, if you wish to disable this behavior, you can set the Livy config `livy.server.interactive.heartbeat.timeout` to `0` from the Ambari UI. For clusters v3.5, if you do not set the 3.5 configuration above, the session will not be deleted.
103+
>[!TIP]
104+
>Heartbeats are sent to ensure that sessions are not leaked. Note that when a computer goes to sleep or is shut down, the hearbeat will not be sent, resulting in the session being cleaned up. For clusters v3.4, if you wish to disable this behavior, you can set the Livy config `livy.server.interactive.heartbeat.timeout` to `0` from the Ambari UI. For clusters v3.5, if you do not set the 3.5 configuration above, the session will not be deleted.
104105
105106
6. Start Jupyter. Use the following command from the command prompt.
106-
107+
107108
jupyter notebook
108109

109110
7. Verify that you can connect to the cluster using the Jupyter notebook and that you can use the Spark magic available with the kernels. Perform the following steps.
110-
111+
111112
1. Create a new notebook. From the right hand corner, click **New**. You should see the default kernel **Python2** and the two new kernels that you install, **PySpark** and **Spark**.
112-
113+
113114
![Create a new Jupyter notebook](./media/hdinsight-apache-spark-jupyter-notebook-install-locally/jupyter-kernels.png "Create a new Jupyter notebook")
114115

115116
Click **PySpark**.
@@ -122,7 +123,8 @@ In this section you configure the Spark magic that you installed earlier to conn
122123

123124
If you can successfully retrieve the output, your connection to the HDInsight cluster is tested.
124125

125-
>[!TIP] If you want to update the notebook configuration to connect to a different cluster, update the config.json with the new set of values, as shown in Step 3 above.
126+
>[!TIP]
127+
>If you want to update the notebook configuration to connect to a different cluster, update the config.json with the new set of values, as shown in Step 3 above.
126128
127129
## Why should I install Jupyter on my computer?
128130
There can be a number of reasons why you might want to install Jupyter on your computer and then connect it to a Spark cluster on HDInsight.
@@ -135,8 +137,8 @@ There can be a number of reasons why you might want to install Jupyter on your c
135137

136138
> [!WARNING]
137139
> With Jupyter installed on your local computer, multiple users can run the same notebook on the same Spark cluster at the same time. In such a situation, multiple Livy sessions are created. If you run into an issue and want to debug that, it will be a complex task to track which Livy session belongs to which user.
138-
>
139-
>
140+
>
141+
>
140142
141143
## <a name="seealso"></a>See also
142144
* [Overview: Apache Spark on Azure HDInsight](hdinsight-apache-spark-overview.md)
@@ -162,4 +164,3 @@ There can be a number of reasons why you might want to install Jupyter on your c
162164
### Manage resources
163165
* [Manage resources for the Apache Spark cluster in Azure HDInsight](hdinsight-apache-spark-resource-manager.md)
164166
* [Track and debug jobs running on an Apache Spark cluster in HDInsight](hdinsight-apache-spark-job-debugging.md)
165-

articles/hdinsight/hdinsight-domain-joined-introduction.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@ ms.author: saurinsh
2222
Azure HDInsight until today supported only a single user local admin. This worked great for smaller application teams or departments. As Hadoop based workloads gained more popularity in the enterprise sector, the need for enterprise grade capabilities like active directory based authentication, multi-user support, and role based access control became increasingly important. Using Domain-joined HDInsight clusters, you can create an HDInsight cluster joined to an Active Directory domain, configure a list of employees from the enterprise who can authenticate through Azure Active Directory to log on to HDInsight cluster. Anyone outside the enterprise cannot log on or access the HDInsight cluster. The enterprise admin can configure role based access control for Hive security using [Apache Ranger](http://hortonworks.com/apache/ranger/), thus restricting access to data to only as much as needed. Finally, the admin can audit the data access by employees, and any changes done to access control policies, thus achieving a high degree of governance of their corporate resources.
2323

2424
> [!NOTE]
25-
> The new features described in this preview are available only on Linux-based HDInsight clusters for Hive workload. The other workloads, such as HBase, Spark, Storm and Kafka, will be enabled in future releases.
26-
>
27-
>
25+
> The new features described in this preview are available only on Linux-based HDInsight clusters for Hive workload. The other workloads, such as HBase, Spark, Storm and Kafka, will be enabled in future releases.
26+
>
27+
>
2828
2929
## Benefits
3030
Enterprise Security contains four big pillars – Perimeter Security, Authentication, Authorization, and Encryption.
@@ -50,5 +50,4 @@ Protecting data is important for meeting organizational security and compliance
5050
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
5151
* For managing a Domain-joined HDInsight clusters, see [Manage Domain-joined HDInsight clusters](hdinsight-domain-joined-manage.md).
5252
* For configuring Hive policies and run Hive queries, see [Configure Hive policies for Domain-joined HDInsight clusters](hdinsight-domain-joined-run-hive.md).
53-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
54-
53+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).

articles/hdinsight/hdinsight-domain-joined-manage.md

+16-17
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@ A domain-joined HDInsight cluster has three new users in addition to Ambari Admi
3131

3232
* **Ranger admin**: This account is the local Apache Ranger admin account. It is not an active directory domain user. This account can be used to setup policies and make other users admins or delegated admins (so that those users can manage policies). By default, the username is *admin* and the password is the same as the Ambari admin password. The password can be updated from the Settings page in Ranger.
3333
* **Cluster admin domain user**: This account is an active directory domain user designated as the Hadoop cluster admin including Ambari and Ranger. You must provide this user’s credentials during cluster creation. This user has the following privileges:
34-
34+
3535
* Join machines to the domain and place them within the OU that you specify during cluster creation.
36-
* Create service principals within the OU that you specify during cluster creation.
36+
* Create service principals within the OU that you specify during cluster creation.
3737
* Create reverse DNS entries.
38-
39-
Note the other AD users also have these privileges.
40-
41-
There are some end points within the cluster (for example, Templeton) which are not managed by Ranger, and hence are not secure. These end points are locked down for all users except the cluster admin domain user.
38+
39+
Note the other AD users also have these privileges.
40+
41+
There are some end points within the cluster (for example, Templeton) which are not managed by Ranger, and hence are not secure. These end points are locked down for all users except the cluster admin domain user.
4242
* **Regular**: During cluster creation, you can provide multiple active directory groups. The users in these groups will be synced to Ranger and Ambari. These users are domain users and will have access to only Ranger-managed endpoints (for example, Hiveserver2). All the RBAC policies and auditing will be applicable to these users.
4343

4444
## Roles of Domain-joined HDInsight clusters
@@ -55,7 +55,7 @@ Domain-joined HDInsight have the following roles:
5555
1. Open the Ambari Management UI. See [Open the Ambari Management UI](#open-the-ambari-management-ui).
5656
2. From the left menu, click **Roles**.
5757
3. Click the blue question mark to see the permissions:
58-
58+
5959
![Domain-joined HDInsight roles permissions](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-roles-permissions.png)
6060

6161
## Open the Ambari Management UI
@@ -64,36 +64,36 @@ Domain-joined HDInsight have the following roles:
6464
3. Click **Dashboard** from the top menu to open Ambari.
6565
4. Log on to Ambari using the cluster administrator domain user name and password.
6666
5. Click the **Admin** dropdown menu from the upper right corner, and then click **Manage Ambari**.
67-
67+
6868
![Domain-joined HDInsight manage Ambari](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-manage-ambari.png)
69-
69+
7070
The UI looks like:
71-
71+
7272
![Domain-joined HDInsight Ambari management UI](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-ambari-management-ui.png)
7373

7474
## List the domain users synchronized from your Active Directory
7575
1. Open the Ambari Management UI. See [Open the Ambari Management UI](#open-the-ambari-management-ui).
7676
2. From the left menu, click **Users**. You shall see all the users synced from your Active Directory to the HDInsight cluster.
77-
77+
7878
![Domain-joined HDInsight Ambari management UI list users](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-ambari-management-ui-users.png)
7979

8080
## List the domain groups synchronized from your Active Directory
8181
1. Open the Ambari Management UI. See [Open the Ambari Management UI](#open-the-ambari-management-ui).
8282
2. From the left menu, click **Groups**. You shall see all the groups synced from your Active Directory to the HDInsight cluster.
83-
83+
8484
![Domain-joined HDInsight Ambari management UI list groups](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-ambari-management-ui-groups.png)
8585

8686
## Configure Hive Views permissions
8787
1. Open the Ambari Management UI. See [Open the Ambari Management UI](#open-the-ambari-management-ui).
8888
2. From the left menu, click **Views**.
8989
3. Click **HIVE** to show the details.
90-
90+
9191
![Domain-joined HDInsight Ambari management UI Hive Views](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-ambari-management-ui-hive-views.png)
9292
4. Click the **Hive View** link to configure Hive Views.
9393
5. Scroll down to the **Permissions** section.
94-
94+
9595
![Domain-joined HDInsight Ambari management UI Hive Views configure permissions](./media/hdinsight-domain-joined-manage/hdinsight-domain-joined-ambari-management-ui-hive-views-permissions.png)
96-
6. Click **Add User** or **Add Group**, and then specify the users or groups that can use Hive Views.
96+
6. Click **Add User** or **Add Group**, and then specify the users or groups that can use Hive Views.
9797

9898
## Configure users for the roles
9999
To see a list of roles and their permissions, see [Roles of Domain-joined HDInsight clusters](#roles-of-domain---joined-hdinsight-clusters).
@@ -105,5 +105,4 @@ Domain-joined HDInsight have the following roles:
105105
## Next steps
106106
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
107107
* For configuring Hive policies and run Hive queries, see [Configure Hive policies for Domain-joined HDInsight clusters](hdinsight-domain-joined-run-hive.md).
108-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
109-
108+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).

articles/hdinsight/hdinsight-domain-joined-run-hive.md

+24-25
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,16 @@ Learn how to configure Apache Ranger policies for Hive. In this article, you cre
2828
## Connect to Apache Ranger Admin UI
2929
**To connect to Ranger Admin UI**
3030

31-
1. From a browser, connect to Ranger Admin UI. The URL is https://&lt;ClusterName>.azurehdinsight.net/Ranger/.
32-
31+
1. From a browser, connect to Ranger Admin UI. The URL is https://&lt;ClusterName>.azurehdinsight.net/Ranger/.
32+
3333
> [!NOTE]
3434
> Ranger uses different credentials than Hadoop cluster. To prevent browsers using cached Hadoop credentials, use new inprivate browser window to connect to the Ranger Admin UI.
35-
>
36-
>
35+
>
36+
>
3737
2. Log in using the cluster administrator domain user name and password:
38-
38+
3939
![HDInsight Domain-joined Ranger home page](./media/hdinsight-domain-joined-run-hive/hdinsight-domain-joined-ranger-home-page.png)
40-
40+
4141
Currently, Ranger only works with Yarn and Hive.
4242

4343
## Create Domain users
@@ -51,23 +51,23 @@ In this section, you will create two Ranger policies for accessing hivesampletab
5151
1. Open Ranger Admin UI. See [Connect to Apache Ranger Admin UI](#connect-to-apache-ranager-admin-ui).
5252
2. Click **&lt;ClusterName>_hive**, under **Hive**. You shall see two pre-configure policies.
5353
3. Click **Add New Policy**, and then enter the following values:
54-
54+
5555
* Policy name: read-hivesampletable-all
5656
* Hive Database: default
5757
* table: hivesampletable
5858
* Hive column: *
5959
* Select User: hiveuser1
6060
* Permissions: select
61-
61+
6262
![HDInsight Domain-joined Ranger Hive policy configure](./media/hdinsight-domain-joined-run-hive/hdinsight-domain-joined-configure-ranger-policy.png).
63-
63+
6464
> [!NOTE]
6565
> If a domain user is not populated in Select User, wait a few moments for Ranger to sync with AAD.
66-
>
67-
>
66+
>
67+
>
6868
4. Click **Add** to save the policy.
6969
5. Repeat the last two steps to create another policy with the following properties:
70-
70+
7171
* Policy name: read-hivesampletable-devicemake
7272
* Hive Database: default
7373
* table: hivesampletable
@@ -98,20 +98,20 @@ In the last section, you have configured two policies. hiveuser1 has the select
9898

9999
1. Open a new or existing workbook in Excel.
100100
2. From the **Data** tab, click **From Other Data Sources**, and then click **From Data Connection Wizard** to launch the **Data Connection Wizard**.
101-
101+
102102
![Open data connection wizard][img-hdi-simbahiveodbc.excel.dataconnection]
103103
3. Select **ODBC DSN** as the data source, and then click **Next**.
104104
4. From ODBC data sources, select the data source name that you created in the previous step, and then click **Next**.
105105
5. Re-enter the password for the cluster in the wizard, and then click **OK**. Wait for the **Select Database and Table** dialog to open. This can take a few seconds.
106-
6. Select **hivesampletable**, and then click **Next**.
106+
6. Select **hivesampletable**, and then click **Next**.
107107
7. Click **Finish**.
108-
8. In the **Import Data** dialog, you can change or specify the query. To do so, click **Properties**. This can take a few seconds.
108+
8. In the **Import Data** dialog, you can change or specify the query. To do so, click **Properties**. This can take a few seconds.
109109
9. Click the **Definition** tab. The command text is:
110-
110+
111111
SELECT * FROM "HIVE"."default"."hivesampletable"
112-
112+
113113
By the Ranger policies you defined, hiveuser1 has select permission on all the columns. So this query works with hiveuser1's credentials, but this query does not not work with hiveuser2's credentials.
114-
114+
115115
![Connection Properties][img-hdi-simbahiveodbc-excel-connectionproperties]
116116
10. Click **OK** to close the Connection Properties dialog.
117117
11. Click **OK** to close the **Import Data** dialog.
@@ -121,23 +121,22 @@ To test the second policy (read-hivesampletable-devicemake) you created in the l
121121

122122
1. Add a new sheet in Excel.
123123
2. Follow the last procedure to import the data. The only change you will make is to use hiveuser2's credentials instead of hiveuser1's. This will fail because hiveuser2 only has permission to see two columns. You shall get the following error:
124-
124+
125125
[Microsoft][HiveODBC] (35) Error from Hive: error code: '40000' error message: 'Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hiveuser2] does not have [SELECT] privilege on [default/hivesampletable/clientid,country ...]'.
126126
3. Follow the same procedure to import data. This time, use hiveuser2's credentials, and also modify the select statement from:
127-
127+
128128
SELECT * FROM "HIVE"."default"."hivesampletable"
129-
129+
130130
to:
131-
131+
132132
SELECT clientid, devicemake FROM "HIVE"."default"."hivesampletable"
133-
133+
134134
When it is done, you shall see two columns of data imported.
135135

136136
## Next steps
137137
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
138138
* For managing a Domain-joined HDInsight clusters, see [Manage Domain-joined HDInsight clusters](hdinsight-domain-joined-manage.md).
139-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
139+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).
140140
* For Connecting Hive using Hive JDBC, see [Connect to Hive on Azure HDInsight using the Hive JDBC driver](hdinsight-connect-hive-jdbc-driver.md)
141141
* For connecting Excel to Hadoop using Hive ODBC, see [Connect Excel to Hadoop with the Microsoft Hive ODBC drive](hdinsight-connect-excel-hive-odbc-driver.md)
142142
* For connecting Excel to Hadoop using Power Query, see [Connect Excel to Hadoop by using Power Query](hdinsight-connect-excel-power-query.md)
143-

0 commit comments

Comments
 (0)