Skip to content

Commit 16f7ecb

Browse files
authored
docs: add links to GitHub and feature tracker (#445)
* formatting * add links * fix CRD links * fix CRD links * fix CRD links
1 parent 9da0e26 commit 16f7ecb

1 file changed

Lines changed: 46 additions & 36 deletions

File tree

docs/modules/hive/pages/index.adoc

Lines changed: 46 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,72 +1,82 @@
11
= Stackable Operator for Apache Hive
22
:description: The Stackable Operator for Apache Hive is a Kubernetes operator that can manage Apache Hive metastores. Learn about its features, resources, dependencies and demos, and see the list of supported Hive versions.
33
:keywords: Stackable Operator, Hadoop, Apache Hive, Kubernetes, k8s, operator, engineer, big data, metadata, storage, query
4-
5-
This is an operator for Kubernetes that can manage https://hive.apache.org[Apache Hive] metastores. The Apache Hive
6-
metastore (HMS) was originally developed as part of Apache Hive. It stores information on the location of tables and
7-
partitions in file and blob storages such as xref:hdfs:index.adoc[Apache HDFS] and S3 and is now used by other tools
8-
besides Hive as well to access tables in files. This Operator does not support deploying Hive itself, but
9-
xref:trino:index.adoc[Trino] is recommended as an alternative query engine.
4+
:hive: https://hive.apache.org
5+
:github: https://github.com/stackabletech/hive-operator/
6+
:crd: {crd-docs-base-url}/hive-operator/{crd-docs-version}/
7+
:crd-hivecluster: {crd-docs}/hive.stackable.tech/hivecluster/v1alpha1/
8+
:feature-tracker: https://features.stackable.tech/unified
9+
10+
[.link-bar]
11+
* {github}[GitHub {external-link-icon}^]
12+
* {feature-tracker}[Feature Tracker {external-link-icon}^]
13+
* {crd}[CRD documentation {external-link-icon}^]
14+
15+
This is an operator for Kubernetes that can manage {hive}[Apache Hive] metastores.
16+
The Apache Hive metastore (HMS) was originally developed as part of Apache Hive.
17+
It stores information on the location of tables and partitions in file and blob storages such as xref:hdfs:index.adoc[Apache HDFS] and S3 and is now used by other tools besides Hive as well to access tables in files.
18+
This operator does not support deploying Hive itself, but xref:trino:index.adoc[Trino] is recommended as an alternative query engine.
1019

1120
== Getting started
1221

13-
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable
14-
Hive Operator and its dependencies. It walks you through setting up a Hive metastore and connecting it to a demo
15-
Postgres database and a Minio instance to store data in.
22+
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable Hive operator and its dependencies.
23+
It walks you through setting up a Hive metastore and connecting it to a demo Postgres database and a Minio instance to store data in.
1624

17-
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your Hive metastore
18-
configuration to your needs, or have a look at the <<demos, demos>> for some example setups with either
19-
xref:trino:index.adoc[Trino] or xref:spark-k8s:index.adoc[Spark].
25+
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your Hive metastore configuration to your needs, or have a look at the <<demos, demos>> for some example setups with either xref:trino:index.adoc[Trino] or xref:spark-k8s:index.adoc[Spark].
2026

2127
== Operator model
2228

23-
The Operator manages the _HiveCluster_ custom resource. The cluster implements a single `metastore`
24-
xref:concepts:roles-and-role-groups.adoc[role].
29+
The operator manages the _HiveCluster_ custom resource.
30+
The cluster implements a single `metastore` xref:concepts:roles-and-role-groups.adoc[role].
2531

26-
image::hive_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache Hive]
32+
image::hive_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable operator for Apache Hive]
2733

28-
For every role group the Operator creates a ConfigMap and StatefulSet which can have multiple replicas (Pods). Every
29-
role group is accessible through its own Service, and there is a Service for the whole cluster.
34+
For every role group the operator creates a ConfigMap and StatefulSet which can have multiple replicas (Pods).
35+
Every role group is accessible through its own Service, and there is a Service for the whole cluster.
3036

31-
The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the Hive metastore
32-
instance. The discovery ConfigMap contains information on how to connect to the HMS.
37+
The operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the Hive metastore instance.
38+
The discovery ConfigMap contains information on how to connect to the HMS.
3339

3440
== Dependencies
3541

36-
The Stackable Operator for Apache Hive depends on the Stackable xref:commons-operator:index.adoc[commons],
37-
xref:secret-operator:index.adoc[secret] and xref:listener-operator:index.adoc[listener] operators.
42+
The Stackable operator for Apache Hive depends on the Stackable xref:commons-operator:index.adoc[commons], xref:secret-operator:index.adoc[secret] and xref:listener-operator:index.adoc[listener] operators.
3843

3944
== Required external component: An SQL database
4045

41-
The Hive metastore requires a database to store metadata. Consult the xref:required-external-components.adoc[required
42-
external components page] for an overview of the supported databases and minimum supported versions.
46+
The Hive metastore requires an SQL database to store metadata.
47+
Consult the xref:required-external-components.adoc[required external components page] for an overview of the supported databases and minimum supported versions.
4348

44-
== [[demos]]Demos
49+
== [[demos]]Demos
4550

4651
Three demos make use of the Hive metastore.
4752

48-
The xref:demos:spark-k8s-anomaly-detection-taxi-data.adoc[] and xref:demos:trino-taxi-data.adoc[] use the HMS to store
49-
metadata information about taxi data. The first demo then analyzes the data using xref:spark-k8s:index.adoc[Apache Spark]
50-
and the second one using xref:trino:index.adoc[Trino].
53+
The xref:demos:spark-k8s-anomaly-detection-taxi-data.adoc[] and xref:demos:trino-taxi-data.adoc[] use the HMS to store metadata information about taxi data.
54+
The first demo then analyzes the data using xref:spark-k8s:index.adoc[Apache Spark] and the second one using xref:trino:index.adoc[Trino].
5155

52-
The xref:demos:data-lakehouse-iceberg-trino-spark.adoc[] demo is the biggest demo available. It uses both Spark and
53-
Trino for analysis.
56+
The xref:demos:data-lakehouse-iceberg-trino-spark.adoc[] demo is the biggest demo available.
57+
It uses both Spark and Trino for analysis.
5458

5559
== Why is the Hive query engine not supported?
5660

57-
Only the metastore is supported, not Hive itself. There are several reasons why running Hive on Kubernetes may not be an
58-
optimal solution. The most obvious reason is that Hive requires YARN as an execution framework, and YARN assumes much of
59-
the same role as Kubernetes - i.e. assigning resources. For this reason we provide xref:trino:index.adoc[Trino] as a
60-
query engine in the Stackable Data Platform instead of Hive. Trino still uses the Hive Metastore, hence the inclusion of
61-
this operator as well. Trino should offer all the capabilities Hive offers including a lot of additional functionality,
62-
such as connections to other data sources.
61+
Only the metastore is supported, not Hive itself.
62+
There are several reasons why running Hive on Kubernetes may not be an optimal solution.
63+
The most obvious reason is that Hive requires YARN as an execution framework, and YARN assumes much of the same role as Kubernetes - i.e. assigning resources.
64+
For this reason we provide xref:trino:index.adoc[Trino] as a query engine in the Stackable Data Platform instead of Hive.
65+
Trino still uses the Hive Metastore, hence the inclusion of this operator as well.
66+
Trino should offer all the capabilities Hive offers including a lot of additional functionality, such as connections to other data sources.
6367

6468
Additionally, Tables in the HMS can also be accessed from xref:spark-k8s:index.adoc[Apache Spark].
6569

6670
== Supported versions
6771

68-
The Stackable Operator for Apache Hive currently supports the Hive versions listed below.
72+
The Stackable operator for Apache Hive currently supports the Hive versions listed below.
6973
To use a specific Hive version in your HiveCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
7074
The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well.
7175

7276
include::partial$supported-versions.adoc[]
77+
78+
== Useful links
79+
80+
* The {github}[hive-operator {external-link-icon}^] GitHub repository
81+
* The operator feature overview in the {feature-tracker}[feature tracker {external-link-icon}^]
82+
* The {crd-hivecluster}[HiveCluster {external-link-icon}^] CRD documentation

0 commit comments

Comments
 (0)