Skip to content

Commit d0e3d90

Browse files
adwk67maltesanderfhennigsbernauer
authored
docs: detail how to mount and use external drivers (#449)
* docs: detail how to mount and use external drivers * added note to requirements page, corrected image tag * changelog * typo * Update docs/modules/hive/pages/usage-guide/database-driver-example.adoc Co-authored-by: Malte Sander <contact@maltesander.com> * initialize SQL in products * review feedback * Update docs/modules/hive/pages/required-external-components.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> * Update docs/modules/hive/pages/usage-guide/database-driver.adoc Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de> * added dockerfile alternative * change to path --------- Co-authored-by: Malte Sander <contact@maltesander.com> Co-authored-by: Felix Hennig <fhennig@users.noreply.github.com> Co-authored-by: Sebastian Bernauer <sebastian.bernauer@stackable.de>
1 parent 364af3e commit d0e3d90

4 files changed

Lines changed: 238 additions & 0 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,10 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- Added documentation/tutorial on using external database drivers ([#449]).
10+
711
### Changed
812

913
- BREAKING: Switch to new image that only contains HMS.
@@ -12,6 +16,7 @@ All notable changes to this project will be documented in this file.
1216
`metastore-log4j2.properties` ([#447]).
1317

1418
[#447]: https://github.com/stackabletech/hive-operator/pull/447
19+
[#449]: https://github.com/stackabletech/hive-operator/pull/449
1520

1621
## [24.3.0] - 2024-03-20
1722

docs/modules/hive/pages/required-external-components.adoc

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,6 @@ The Hive Metastore requires a backend SQL database. Supported databases and vers
88
* MS SQL Server 2008 R2 and above
99
1010
Reference: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration#AdminManualMetastoreAdministration-SupportedBackendDatabasesforMetastore[Hive Metastore documentation]
11+
12+
The Stackable product images for Apache Hive come with built-in support for PostgreSQL.
13+
See xref:usage-guide/database-driver.adoc[] for details on how to make drivers for other databases (supported by Hive) available.
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
= Database drivers
2+
3+
The Stackable product images for Apache Hive come with built-in support for using PostgreSQL as the metastore database.
4+
The MySQL driver is not shipped in our images due to licensing issues.
5+
To use another supported database it is necessary to make the relevant drivers available to Hive: this tutorial shows how this is done for MySQL.
6+
7+
== Install the MySQL helm chart
8+
9+
[source,bash]
10+
----
11+
helm install mysql oci://registry-1.docker.io/bitnamicharts/mysql \
12+
--set auth.database=hive \
13+
--set auth.username=hive \
14+
--set auth.password=hive
15+
----
16+
17+
== Download the driver to a PersistentVolumeClaim
18+
19+
.Create a PersistentVolumeClaim
20+
[source,yaml]
21+
----
22+
---
23+
apiVersion: v1
24+
kind: PersistentVolumeClaim
25+
metadata:
26+
name: pvc-hive-drivers
27+
spec:
28+
accessModes:
29+
- ReadWriteOnce
30+
resources:
31+
requests:
32+
storage: 1Gi
33+
----
34+
35+
Download the driver from e.g. https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/[maven] to a volume backed by the PVC:
36+
37+
.Download the driver
38+
[source,yaml]
39+
----
40+
---
41+
apiVersion: batch/v1
42+
kind: Job
43+
metadata:
44+
name: pvc-hive-job
45+
spec:
46+
template:
47+
spec:
48+
restartPolicy: Never
49+
volumes:
50+
- name: external-drivers
51+
persistentVolumeClaim:
52+
claimName: pvc-hive-drivers
53+
initContainers:
54+
- name: dest-dir
55+
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
56+
env:
57+
- name: DEST_DIR
58+
value: "/stackable/externals"
59+
command:
60+
[
61+
"bash",
62+
"-x",
63+
"-c",
64+
"mkdir -p ${DEST_DIR} && chown stackable:stackable ${DEST_DIR} && chmod -R a=,u=rwX,g=rwX ${DEST_DIR}",
65+
]
66+
securityContext:
67+
runAsUser: 0
68+
volumeMounts:
69+
- name: external-drivers
70+
mountPath: /stackable/externals
71+
containers:
72+
- name: hive-driver
73+
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
74+
env:
75+
- name: DEST_DIR
76+
value: "/stackable/externals"
77+
command:
78+
[
79+
"bash",
80+
"-x",
81+
"-c",
82+
"curl -L https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/mysql-connector-j-8.0.31.jar \
83+
-o ${DEST_DIR}/mysql-connector-j-8.0.31.jar",
84+
]
85+
volumeMounts:
86+
- name: external-drivers
87+
mountPath: /stackable/externals
88+
----
89+
90+
This will make the driver available at `/stackable/external-drivers/mysql-connector-j-8.0.31.jar` when the volume `external-drivers` is mounted at `/stackable/external-drivers`.
91+
92+
Once the above has completed successfully, you can confirm that the driver is in the expected location by running another job:
93+
94+
[source,yaml]
95+
----
96+
---
97+
apiVersion: batch/v1
98+
kind: Job
99+
metadata:
100+
name: list-drivers-job
101+
spec:
102+
template:
103+
spec:
104+
restartPolicy: Never
105+
volumes:
106+
- name: external-drivers
107+
persistentVolumeClaim:
108+
claimName: pvc-hive-drivers
109+
containers:
110+
- name: hive-driver
111+
image: docker.stackable.tech/stackable/tools:1.0.0-stackable24.3.0
112+
env:
113+
- name: DEST_DIR
114+
value: "/stackable/externals"
115+
command:
116+
[
117+
"bash",
118+
"-x",
119+
"-o",
120+
"pipefail",
121+
"-c",
122+
"stat ${DEST_DIR}/mysql-connector-j-8.0.31.jar",
123+
]
124+
volumeMounts:
125+
- name: external-drivers
126+
mountPath: /stackable/externals
127+
----
128+
129+
== Create a Hive cluster
130+
131+
The MySQL connection details can then be used in the definition of the Hive Metastore resource.
132+
Note that it is also necessary to "tell" Hive where to find the driver.
133+
This is done by setting the value of the environment variable `METASTORE_AUX_JARS_PATH` to the path of the mounted driver:
134+
135+
[source,yaml]
136+
----
137+
---
138+
apiVersion: hive.stackable.tech/v1alpha1
139+
kind: HiveCluster
140+
metadata:
141+
name: hive-with-drivers
142+
spec:
143+
image:
144+
productVersion: 3.1.3
145+
clusterConfig:
146+
database:
147+
connString: jdbc:mysql://mysql:3306/hive # <1>
148+
user: hive # <2>
149+
password: hive
150+
dbType: mysql
151+
s3:
152+
reference: minio # <3>
153+
metastore:
154+
roleGroups:
155+
default:
156+
envOverrides:
157+
METASTORE_AUX_JARS_PATH: "/stackable/external-drivers/mysql-connector-j-8.0.31.jar" # <4>
158+
podOverrides: # <5>
159+
spec:
160+
containers:
161+
- name: hive
162+
volumeMounts:
163+
- name: external-drivers
164+
mountPath: /stackable/external-drivers
165+
volumes:
166+
- name: external-drivers
167+
persistentVolumeClaim:
168+
claimName: pvc-hive-drivers
169+
replicas: 1
170+
----
171+
172+
<1> The database connection details matching those given when deploying the MySQL Helm chart
173+
<2> Plain-text Hive credentials will be replaced in an upcoming release!
174+
<3> A reference to the file store using S3 (this has been omitted from this article for the sake of brevity, but is described in e.g. the xref:getting_started/first_steps.adoc[] guide)
175+
<4> Use `envOverrides` to set the driver path
176+
<5> Use `podOverrides` to mount the driver
177+
178+
NOTE: This has been tested on Azure AKS and Amazon EKS, both running Kubernetes 1.29.
179+
The example shows a PVC mounted with the access mode `ReadWriteOnce` as we have a single metastore instance that is deployed only once the jobs have completed, and, so long as these all run after each other, they can be deployed to different nodes.
180+
Different scenarios may require a different access mode, the availability of which is dependent on the type of cluster in use.
181+
182+
== Alternative: using a custom image
183+
184+
If you have access to a registry to store custom images, another approach is to build a custom image on top of a Stackable product image and "bake" the driver into it directly:
185+
186+
.Copy the driver
187+
[source]
188+
----
189+
FROM docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev
190+
191+
RUN curl --fail -L https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.0.31/mysql-connector-j-8.0.31.jar -o /stackable/mysql-connector-j-8.0.31.jar
192+
----
193+
194+
.Build and tag the image
195+
[source]
196+
----
197+
docker build -f ./Dockerfile -t docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev-mysql .
198+
----
199+
200+
.Reference the new path to the driver without the need for using a volume mounted from a PVC
201+
[source, yaml]
202+
----
203+
---
204+
apiVersion: hive.stackable.tech/v1alpha1
205+
kind: HiveCluster
206+
metadata:
207+
name: hive
208+
spec:
209+
image:
210+
custom: docker.stackable.tech/stackable/hive:3.1.3-stackable0.0.0-dev-mysql # <1>
211+
productVersion: 3.1.3
212+
clusterConfig:
213+
database:
214+
...
215+
s3:
216+
...
217+
metastore:
218+
config:
219+
logging:
220+
enableVectorAgent: False
221+
roleGroups:
222+
default:
223+
envOverrides:
224+
METASTORE_AUX_JARS_PATH: "/stackable/mysql-connector-j-8.0.31.jar" # <2>
225+
replicas: 1
226+
----
227+
228+
<1> Name of the custom image containing the driver
229+
<2> Path to the driver

docs/modules/hive/partials/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
** xref:hive:usage-guide/listenerclass.adoc[]
77
** xref:hive:usage-guide/data-storage.adoc[]
88
** xref:hive:usage-guide/derby-example.adoc[]
9+
** xref:hive:usage-guide/database-driver.adoc[]
910
** xref:hive:usage-guide/logging.adoc[]
1011
** xref:hive:usage-guide/monitoring.adoc[]
1112
** xref:hive:usage-guide/resources.adoc[]

0 commit comments

Comments
 (0)