You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-apache-spark-jupyter-notebook-install-locally.md
+17-16
Original file line number
Diff line number
Diff line change
@@ -40,9 +40,9 @@ You must install Python before you can install Jupyter notebooks. Both Python a
40
40
41
41
1. Download the [Anaconda installer](https://www.continuum.io/downloads) for your platform and run the setup. While running the setup wizard, make sure you select the option to add Anaconda to your PATH variable.
42
42
2. Run the following command to install Jupyter.
43
-
43
+
44
44
conda install jupyter
45
-
45
+
46
46
For more information on installting Jupyter, see [Installing Jupyter using Anaconda](http://jupyter.readthedocs.io/en/latest/install.html).
47
47
48
48
## Install the kernels and Spark magic
@@ -56,19 +56,19 @@ For clusters v3.5, please install sparkmagic 0.8.4 by executing `pip install spa
56
56
In this section you configure the Spark magic that you installed earlier to connect to an Apache Spark cluster that you must have already created in Azure HDInsight.
57
57
58
58
1. The Jupyter configuration information is typically stored in the users home directory. To locate your home directory on any OS platform, type the following commands.
59
-
59
+
60
60
Start the Python shell. On a command window, type the following:
61
-
61
+
62
62
python
63
-
63
+
64
64
On the Python shell, enter the following command to find out the home directory.
65
-
65
+
66
66
import os
67
67
print(os.path.expanduser('~'))
68
68
69
69
2. Navigate to the home directory and create a folder called **.sparkmagic** if it does not already exist.
70
70
3. Within the folder, create a file called **config.json** and add the following JSON snippet inside it.
71
-
71
+
72
72
{
73
73
"kernel_python_credentials" : {
74
74
"username": "{USERNAME}",
@@ -83,7 +83,7 @@ In this section you configure the Spark magic that you installed earlier to conn
83
83
}
84
84
85
85
4. Substitute **{USERNAME}**, **{CLUSTERDNSNAME}**, and **{BASE64ENCODEDPASSWORD}** with appropriate values. You can use a number of utilities in your favorite programming language or online to generate a base64 encoded password for your actualy password. A simple Python snippet to run from your command prompt would be:
5. Configure the right Heartbeat settings in `config.json`:
@@ -100,16 +100,17 @@ In this section you configure the Spark magic that you installed earlier to conn
100
100
"livy_server_heartbeat_timeout_seconds": 60,
101
101
"heartbeat_retry_seconds": 1
102
102
103
-
>[!TIP] Heartbeats are sent to ensure that sessions are not leaked. Note that when a computer goes to sleep or is shut down, the hearbeat will not be sent, resulting in the session being cleaned up. For clusters v3.4, if you wish to disable this behavior, you can set the Livy config `livy.server.interactive.heartbeat.timeout` to `0` from the Ambari UI. For clusters v3.5, if you do not set the 3.5 configuration above, the session will not be deleted.
103
+
>[!TIP]
104
+
>Heartbeats are sent to ensure that sessions are not leaked. Note that when a computer goes to sleep or is shut down, the hearbeat will not be sent, resulting in the session being cleaned up. For clusters v3.4, if you wish to disable this behavior, you can set the Livy config `livy.server.interactive.heartbeat.timeout` to `0` from the Ambari UI. For clusters v3.5, if you do not set the 3.5 configuration above, the session will not be deleted.
104
105
105
106
6. Start Jupyter. Use the following command from the command prompt.
106
-
107
+
107
108
jupyter notebook
108
109
109
110
7. Verify that you can connect to the cluster using the Jupyter notebook and that you can use the Spark magic available with the kernels. Perform the following steps.
110
-
111
+
111
112
1. Create a new notebook. From the right hand corner, click **New**. You should see the default kernel **Python2** and the two new kernels that you install, **PySpark** and **Spark**.
112
-
113
+
113
114

114
115
115
116
Click **PySpark**.
@@ -122,7 +123,8 @@ In this section you configure the Spark magic that you installed earlier to conn
122
123
123
124
If you can successfully retrieve the output, your connection to the HDInsight cluster is tested.
124
125
125
-
>[!TIP] If you want to update the notebook configuration to connect to a different cluster, update the config.json with the new set of values, as shown in Step 3 above.
126
+
>[!TIP]
127
+
>If you want to update the notebook configuration to connect to a different cluster, update the config.json with the new set of values, as shown in Step 3 above.
126
128
127
129
## Why should I install Jupyter on my computer?
128
130
There can be a number of reasons why you might want to install Jupyter on your computer and then connect it to a Spark cluster on HDInsight.
@@ -135,8 +137,8 @@ There can be a number of reasons why you might want to install Jupyter on your c
135
137
136
138
> [!WARNING]
137
139
> With Jupyter installed on your local computer, multiple users can run the same notebook on the same Spark cluster at the same time. In such a situation, multiple Livy sessions are created. If you run into an issue and want to debug that, it will be a complex task to track which Livy session belongs to which user.
138
-
>
139
-
>
140
+
>
141
+
>
140
142
141
143
## <aname="seealso"></a>See also
142
144
*[Overview: Apache Spark on Azure HDInsight](hdinsight-apache-spark-overview.md)
@@ -162,4 +164,3 @@ There can be a number of reasons why you might want to install Jupyter on your c
162
164
### Manage resources
163
165
*[Manage resources for the Apache Spark cluster in Azure HDInsight](hdinsight-apache-spark-resource-manager.md)
164
166
*[Track and debug jobs running on an Apache Spark cluster in HDInsight](hdinsight-apache-spark-job-debugging.md)
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-domain-joined-introduction.md
+4-5
Original file line number
Diff line number
Diff line change
@@ -22,9 +22,9 @@ ms.author: saurinsh
22
22
Azure HDInsight until today supported only a single user local admin. This worked great for smaller application teams or departments. As Hadoop based workloads gained more popularity in the enterprise sector, the need for enterprise grade capabilities like active directory based authentication, multi-user support, and role based access control became increasingly important. Using Domain-joined HDInsight clusters, you can create an HDInsight cluster joined to an Active Directory domain, configure a list of employees from the enterprise who can authenticate through Azure Active Directory to log on to HDInsight cluster. Anyone outside the enterprise cannot log on or access the HDInsight cluster. The enterprise admin can configure role based access control for Hive security using [Apache Ranger](http://hortonworks.com/apache/ranger/), thus restricting access to data to only as much as needed. Finally, the admin can audit the data access by employees, and any changes done to access control policies, thus achieving a high degree of governance of their corporate resources.
23
23
24
24
> [!NOTE]
25
-
> The new features described in this preview are available only on Linux-based HDInsight clusters for Hive workload. The other workloads, such as HBase, Spark, Storm and Kafka, will be enabled in future releases.
26
-
>
27
-
>
25
+
> The new features described in this preview are available only on Linux-based HDInsight clusters for Hive workload. The other workloads, such as HBase, Spark, Storm and Kafka, will be enabled in future releases.
26
+
>
27
+
>
28
28
29
29
## Benefits
30
30
Enterprise Security contains four big pillars – Perimeter Security, Authentication, Authorization, and Encryption.
@@ -50,5 +50,4 @@ Protecting data is important for meeting organizational security and compliance
50
50
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
51
51
* For managing a Domain-joined HDInsight clusters, see [Manage Domain-joined HDInsight clusters](hdinsight-domain-joined-manage.md).
52
52
* For configuring Hive policies and run Hive queries, see [Configure Hive policies for Domain-joined HDInsight clusters](hdinsight-domain-joined-run-hive.md).
53
-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
54
-
53
+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-domain-joined-manage.md
+16-17
Original file line number
Diff line number
Diff line change
@@ -31,14 +31,14 @@ A domain-joined HDInsight cluster has three new users in addition to Ambari Admi
31
31
32
32
***Ranger admin**: This account is the local Apache Ranger admin account. It is not an active directory domain user. This account can be used to setup policies and make other users admins or delegated admins (so that those users can manage policies). By default, the username is *admin* and the password is the same as the Ambari admin password. The password can be updated from the Settings page in Ranger.
33
33
***Cluster admin domain user**: This account is an active directory domain user designated as the Hadoop cluster admin including Ambari and Ranger. You must provide this user’s credentials during cluster creation. This user has the following privileges:
34
-
34
+
35
35
* Join machines to the domain and place them within the OU that you specify during cluster creation.
36
-
* Create service principals within the OU that you specify during cluster creation.
36
+
* Create service principals within the OU that you specify during cluster creation.
37
37
* Create reverse DNS entries.
38
-
39
-
Note the other AD users also have these privileges.
40
-
41
-
There are some end points within the cluster (for example, Templeton) which are not managed by Ranger, and hence are not secure. These end points are locked down for all users except the cluster admin domain user.
38
+
39
+
Note the other AD users also have these privileges.
40
+
41
+
There are some end points within the cluster (for example, Templeton) which are not managed by Ranger, and hence are not secure. These end points are locked down for all users except the cluster admin domain user.
42
42
***Regular**: During cluster creation, you can provide multiple active directory groups. The users in these groups will be synced to Ranger and Ambari. These users are domain users and will have access to only Ranger-managed endpoints (for example, Hiveserver2). All the RBAC policies and auditing will be applicable to these users.
43
43
44
44
## Roles of Domain-joined HDInsight clusters
@@ -55,7 +55,7 @@ Domain-joined HDInsight have the following roles:
55
55
1. Open the Ambari Management UI. See [Open the Ambari Management UI](#open-the-ambari-management-ui).
56
56
2. From the left menu, click **Roles**.
57
57
3. Click the blue question mark to see the permissions:
6. Click **Add User** or **Add Group**, and then specify the users or groups that can use Hive Views.
96
+
6. Click **Add User** or **Add Group**, and then specify the users or groups that can use Hive Views.
97
97
98
98
## Configure users for the roles
99
99
To see a list of roles and their permissions, see [Roles of Domain-joined HDInsight clusters](#roles-of-domain---joined-hdinsight-clusters).
@@ -105,5 +105,4 @@ Domain-joined HDInsight have the following roles:
105
105
## Next steps
106
106
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
107
107
* For configuring Hive policies and run Hive queries, see [Configure Hive policies for Domain-joined HDInsight clusters](hdinsight-domain-joined-run-hive.md).
108
-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
109
-
108
+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).
Copy file name to clipboardexpand all lines: articles/hdinsight/hdinsight-domain-joined-run-hive.md
+24-25
Original file line number
Diff line number
Diff line change
@@ -28,16 +28,16 @@ Learn how to configure Apache Ranger policies for Hive. In this article, you cre
28
28
## Connect to Apache Ranger Admin UI
29
29
**To connect to Ranger Admin UI**
30
30
31
-
1. From a browser, connect to Ranger Admin UI. The URL is https://<ClusterName>.azurehdinsight.net/Ranger/.
32
-
31
+
1. From a browser, connect to Ranger Admin UI. The URL is https://<ClusterName>.azurehdinsight.net/Ranger/.
32
+
33
33
> [!NOTE]
34
34
> Ranger uses different credentials than Hadoop cluster. To prevent browsers using cached Hadoop credentials, use new inprivate browser window to connect to the Ranger Admin UI.
35
-
>
36
-
>
35
+
>
36
+
>
37
37
2. Log in using the cluster administrator domain user name and password:
38
-
38
+
39
39

40
-
40
+
41
41
Currently, Ranger only works with Yarn and Hive.
42
42
43
43
## Create Domain users
@@ -51,23 +51,23 @@ In this section, you will create two Ranger policies for accessing hivesampletab
51
51
1. Open Ranger Admin UI. See [Connect to Apache Ranger Admin UI](#connect-to-apache-ranager-admin-ui).
52
52
2. Click **<ClusterName>_hive**, under **Hive**. You shall see two pre-configure policies.
53
53
3. Click **Add New Policy**, and then enter the following values:
> If a domain user is not populated in Select User, wait a few moments for Ranger to sync with AAD.
66
-
>
67
-
>
66
+
>
67
+
>
68
68
4. Click **Add** to save the policy.
69
69
5. Repeat the last two steps to create another policy with the following properties:
70
-
70
+
71
71
* Policy name: read-hivesampletable-devicemake
72
72
* Hive Database: default
73
73
* table: hivesampletable
@@ -98,20 +98,20 @@ In the last section, you have configured two policies. hiveuser1 has the select
98
98
99
99
1. Open a new or existing workbook in Excel.
100
100
2. From the **Data** tab, click **From Other Data Sources**, and then click **From Data Connection Wizard** to launch the **Data Connection Wizard**.
101
-
101
+
102
102
![Open data connection wizard][img-hdi-simbahiveodbc.excel.dataconnection]
103
103
3. Select **ODBC DSN** as the data source, and then click **Next**.
104
104
4. From ODBC data sources, select the data source name that you created in the previous step, and then click **Next**.
105
105
5. Re-enter the password for the cluster in the wizard, and then click **OK**. Wait for the **Select Database and Table** dialog to open. This can take a few seconds.
106
-
6. Select **hivesampletable**, and then click **Next**.
106
+
6. Select **hivesampletable**, and then click **Next**.
107
107
7. Click **Finish**.
108
-
8. In the **Import Data** dialog, you can change or specify the query. To do so, click **Properties**. This can take a few seconds.
108
+
8. In the **Import Data** dialog, you can change or specify the query. To do so, click **Properties**. This can take a few seconds.
109
109
9. Click the **Definition** tab. The command text is:
110
-
110
+
111
111
SELECT * FROM "HIVE"."default"."hivesampletable"
112
-
112
+
113
113
By the Ranger policies you defined, hiveuser1 has select permission on all the columns. So this query works with hiveuser1's credentials, but this query does not not work with hiveuser2's credentials.
10. Click **OK** to close the Connection Properties dialog.
117
117
11. Click **OK** to close the **Import Data** dialog.
@@ -121,23 +121,22 @@ To test the second policy (read-hivesampletable-devicemake) you created in the l
121
121
122
122
1. Add a new sheet in Excel.
123
123
2. Follow the last procedure to import the data. The only change you will make is to use hiveuser2's credentials instead of hiveuser1's. This will fail because hiveuser2 only has permission to see two columns. You shall get the following error:
124
-
124
+
125
125
[Microsoft][HiveODBC] (35) Error from Hive: error code: '40000' error message: 'Error while compiling statement: FAILED: HiveAccessControlException Permission denied: user [hiveuser2] does not have [SELECT] privilege on [default/hivesampletable/clientid,country ...]'.
126
126
3. Follow the same procedure to import data. This time, use hiveuser2's credentials, and also modify the select statement from:
127
-
127
+
128
128
SELECT * FROM "HIVE"."default"."hivesampletable"
129
-
129
+
130
130
to:
131
-
131
+
132
132
SELECT clientid, devicemake FROM "HIVE"."default"."hivesampletable"
133
-
133
+
134
134
When it is done, you shall see two columns of data imported.
135
135
136
136
## Next steps
137
137
* For configuring a Domain-joined HDInsight cluster, see [Configure Domain-joined HDInsight clusters](hdinsight-domain-joined-configure.md).
138
138
* For managing a Domain-joined HDInsight clusters, see [Manage Domain-joined HDInsight clusters](hdinsight-domain-joined-manage.md).
139
-
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#connect-to-a-domain-joined-hdinsight-cluster).
139
+
* For running Hive queries using SSH on Domain-joined HDInsight clusters, see [Use SSH with Linux-based Hadoop on HDInsight from Linux, Unix, or OS X](hdinsight-hadoop-linux-use-ssh-unix.md#domain-joined).
140
140
* For Connecting Hive using Hive JDBC, see [Connect to Hive on Azure HDInsight using the Hive JDBC driver](hdinsight-connect-hive-jdbc-driver.md)
141
141
* For connecting Excel to Hadoop using Hive ODBC, see [Connect Excel to Hadoop with the Microsoft Hive ODBC drive](hdinsight-connect-excel-hive-odbc-driver.md)
142
142
* For connecting Excel to Hadoop using Power Query, see [Connect Excel to Hadoop by using Power Query](hdinsight-connect-excel-power-query.md)
0 commit comments