본문 바로가기
Linux/OpenShift

RHOCP) OpenShift Logging (2) - Operator 설치

by LILO 2023. 8. 1.
반응형

INTRO

이전의 글에서 Openshift Logging Operator에 대한 설명을 간단하게 하였습니다.

 

RHOCP) OpenShift Logging (1) - 개요

INTRO 서버, 컨테이너를 운영하다 보면 애플리케이션, API, 서버에서 발생하는 로그를 수집하고 검색 및 분석을 하여 모니터링을 하는 플랫폼이 필요하게 됩니다. 이러한 상황에서 ELK Stack, EFK Stack

lilo.tistory.com

이 글에서는 OpenShift Logging Operator를 설치하는 과정을 소개드립니다.

 

 

 

Openshift Elasticsearch Operator 및 Openshit Logging Operator  -  CLI 설치

Elasticsearch는 메모리를 많이 사용하는 애플리케이션입니다. 최소 16GB 메모리를 가진 Elasticsearch node 3개를 설치하기 때문에 배포할 Infra node(혹은 Compute node)는 16GB 이상의 메모리를 가지고 있어야 합니다.

Elasticsearch가 OOM으로 서비스를 하지 못할 경우 Heap Memory 증가보다는 node 수를 증가 시키는 Scale out 기법을 사용하는 것을 권합니다.

 

Elasticsearch가 설치될 namespace(project)를 생성합니다.

openshift.io/cluster-monitoring: "true"  Label을 설정하여 Prometheus가 Metric을 수집할 수 있게 설정합니다.

# cat << EOF > eo-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-operators-redhat
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF


<namespace 생성 및 확인>
# oc create -f  eo-namespace.yaml
namespace/openshift-operators-redhat created

# oc get project openshift-operators-redhat
NAME                         DISPLAY NAME   STATUS
openshift-operators-redhat                  Active

Openshift Logging Operator가 설치될 namespace를 생성합니다.

openshift.io/cluster-monitoring: "true"  Label을 설정하여 Prometheus가 Metric을 수집할 수 있게 설정합니다.

# cat << EOF > olo-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-logging
  annotations:
    openshift.io/node-selector: ""
  labels:
    openshift.io/cluster-monitoring: "true"
EOF


<namespace 생성 및 확인>
# oc create -f  olo-namespace.yaml
namespace/openshift-logging created

# oc get project openshift-logging
NAME                DISPLAY NAME   STATUS
openshift-logging                  Active

Elasticsearch Operator를 위한 Operator group을 생성합니다.

Operator group: 동일한 namespace에 배포된 모든 Operator를 Operator group으로 구성하여 namespace 목록 또는 클러스터 레벨에서 CR(Custoom Resource)를 조사하는 역할
# cat << EOF > eo-og.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-operators-redhat
  namespace: openshift-operators-redhat
spec: {}
EOF


<Operator group 생성 및 확인>
# oc create eo-og.yaml
operatorgroup.operators.coreos.com/openshift-operators-redhat created

# oc get og -n openshift-operators-redhat
NAME                         AGE
openshift-operators-redhat   13s

Elasticsearch Operator를 위한 Subscription을 생성하고 CSV를 확인합니다.

Elasticsearch Operator는 "openshift-operators-redhat" namespace에 설치됩니다.

- Subscription: 패키지의 Channel을 추적하여 CSV를 최신 상태로 유지시킴
- CSV(Cluster Service Version): 클러스터에서 실행중인 특정 버전의 Operator를 나타냄
# cat << EOF > eo-sub.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: "elasticsearch-operator"
  namespace: "openshift-operators-redhat"
spec:
  channel: "stable-5.5"
  installPlanApproval: "Automatic"
  source: "redhat-operator-index-ocptb"
  sourceNamespace: "openshift-marketplace"
  name: "elasticsearch-operator"
EOF


<Subscription 생성 및 확인>
# oc create -f  eo-sub.yaml
subscription.operators.coreos.com/elasticsearch-operator created

# oc get sub -n openshift-operators-redhat
NAME                     PACKAGE                  SOURCE                        CHANNEL
elasticsearch-operator   elasticsearch-operator   redhat-operator-index-ocptb   stable

# oc get csv --all-namespaces |grep elasticsearch-operator
default                                            elasticsearch-operator.v5.7.2   OpenShift Elasticsearch Operator   5.7.2     elasticsearch-operator.v5.7.1   Succeeded
kube-node-lease                                    elasticsearch-operator.v5.7.2   OpenShift Elasticsearch Operator   5.7.2     elasticsearch-operator.v5.7.1   Succeeded
kube-public                                        elasticsearch-operator.v5.7.2   OpenShift Elasticsearch Operator   5.7.2     elasticsearch-operator.v5.7.1   Succeeded
kube-system                                        elasticsearch-operator.v5.7.2   OpenShift Elasticsearch Operator   5.7.2     elasticsearch-operator.v5.7.1   Succeeded
openshift-apiserver-operator                       elasticsearch-operator.v5.7.2   OpenShift Elasticsearch Operator   5.7.2     elasticsearch-operator.v5.7.1   Succeeded
...(생략)

Openshift Logging Operator를 위한 Operator group을 생성합니다.

# cat << EOF > olo-og.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  targetNamespaces:
  - openshift-logging
EOF


<Operator group 생성 및 확인>
# oc create -f olo-og.yaml
operatorgroup.operators.coreos.com/cluster-logging created

# oc get og -n openshift-logging
NAME              AGE
cluster-logging   16s

 

Openshift Logging Operator를 위한 Subscription을 생성하고 CSV를 확인합니다.

Openshift Logging Operator는 "openshift-logging" namespace에 설치됩니다.

# cat << EOF > olo-sub.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: cluster-logging
  namespace: openshift-logging
spec:
  channel: "stable"
  name: cluster-logging
  source: redhat-operator-index-ocptb
  sourceNamespace: openshift-marketplace

<Subscription 생성 및 확인>
# oc create -f olo-sub.yaml
subscription.operators.coreos.com/cluster-logging created

# oc get sub -n openshift-logging
NAME              PACKAGE           SOURCE                        CHANNEL
cluster-logging   cluster-logging   redhat-operator-index-ocptb   stable

# oc get csv -A |grep -i openshift-logging
openshift-logging                                  cluster-logging.v5.7.2          Red Hat OpenShift Logging          5.7.2     cluster-logging.v5.7.1          Succeeded

 

 

Openshift Logging Instance 생성

Openshift Logging Operator를 위한 인스턴스 yaml을 작성 후 생성합니다.

현재 Infra node에 taint가 설정되어 있기 때문에 tolerations를 설정해야 됩니다.

- curator: Elasticsearch의 인덱스 생성/삭제, 스냅샷 생성/삭제, shard routing allocation 변경과 같은 Elasticsearch의 관리를 위한 반복적인 작업을 수행 (cron 스케줄링으로 반복)
- .spec.elasticsearch.storage: {}   : storage에 {}으로 입력하면  emptyDir 볼륨으로 생성됩니다.
※ emptyDir: Pod가 실행될 때 생성되고 Pod가 삭제될 때 같이 삭제되는 임시볼륨
# cat << EOF > olo-instance.yaml
apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogging"
metadata:
  name: "instance"
  namespace: "openshift-logging"
spec:
  managementState: "Managed"
  logStore:
    type: "elasticsearch"
    retentionPolicy:
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
      storage: {}
      resources:
        limits:
          memory: "8Gi"
        requests:
          memory: "8Gi"
      proxy:
        resources:
          limits:
            memory: 256Mi
          requests:
             memory: 256Mi
      redundancyPolicy: "SingleRedundancy"
  visualization:
    type: "kibana"
    kibana:
      replicas: 1
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
  curation:
    type: "curator"
    curator:
      schedule: "30 3 * * *"
      nodeSelector:
        node-role.kubernetes.io/infra: ""
      tolerations:
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoSchedule
      - key: node-role.kubernetes.io/infra
        value: reserved
        effect: NoExecute
  collection:
    logs:
      type: "fluentd"
      fluentd:
        tolerations:
        - key: node-role.kubernetes.io/infra
          value: reserved
          effect: NoSchedule
        - key: node-role.kubernetes.io/infra
          value: reserved
          effect: NoExecute
EOF


<Openshift Logging 인스턴스 생성 및 확인>
# oc create -f olo-instance.yaml

# oc get pod -n openshift-logging  -o wide
NAME                                            READY   STATUS      RESTARTS   AGE     IP            NODE                       NOMINATED NODE   READINESS GATES
cluster-logging-operator-6f7d96cfb5-gvv84       1/1     Running     0          37m     10.131.0.25   worker2.ocp4.example.com   <none>           <none>
collector-29d4x                                 2/2     Running     0          77s     10.128.2.31   worker1.ocp4.example.com   <none>           <none>
collector-j4hk8                                 2/2     Running     0          76s     10.129.0.68   master1.ocp4.example.com   <none>           <none>
collector-n6nd8                                 2/2     Running     0          77s     10.130.2.28   infra2.ocp4.example.com    <none>           <none>
collector-qdsqn                                 2/2     Running     0          77s     10.129.2.23   infra1.ocp4.example.com    <none>           <none>
collector-r565w                                 2/2     Running     0          77s     10.128.0.45   master2.ocp4.example.com   <none>           <none>
collector-tm7mt                                 2/2     Running     0          77s     10.131.0.31   worker2.ocp4.example.com   <none>           <none>
collector-xl47z                                 2/2     Running     0          76s     10.130.0.42   master3.ocp4.example.com   <none>           <none>
collector-xzkvh                                 2/2     Running     0          77s     10.131.2.21   infra3.ocp4.example.com    <none>           <none>
elasticsearch-cdm-ihn3yrdv-1-85cdbd7b4f-mgrtl   2/2     Running     0          22m     10.130.2.14   infra2.ocp4.example.com    <none>           <none>
elasticsearch-cdm-ihn3yrdv-2-6674db8445-dsk2l   2/2     Running     0          22m     10.131.2.14   infra3.ocp4.example.com    <none>           <none>
elasticsearch-cdm-ihn3yrdv-3-6997c99649-hmtkv   2/2     Running     0          22m     10.129.2.14   infra1.ocp4.example.com    <none>           <none>
elasticsearch-im-app-28181595-5zjkd             0/1     Completed   0          3m53s   10.130.2.26   infra2.ocp4.example.com    <none>           <none>
elasticsearch-im-audit-28181595-ln2cp           0/1     Completed   0          3m53s   10.130.2.25   infra2.ocp4.example.com    <none>           <none>
elasticsearch-im-infra-28181595-bdhm5           0/1     Completed   0          3m53s   10.130.2.27   infra2.ocp4.example.com    <none>           <none>

 

 

Openshift Logging 인스턴스 배포시 Kibana 실행되지 않는 현상 발생

Openshift Logging 인스턴스 생성시 Kibana pod가 실행되지 않는 경우가 발생합니다.

 

Elastic Operator pod의 로그를 확인합니다.

Image Registry에 대한 Egress HTTPS 호출이 실패 되어 리소스에서 누락되어 실행되지 않은 것으로 확인됩니다.

# oc get pod -n openshift-operators-redhat
NAME                                      READY   STATUS    RESTARTS   AGE
elasticsearch-operator-698bb969f4-shkfr   2/2     Running   0          119m

# oc logs -c elasticsearch-operator elasticsearch-operator-698bb969f4-shkfr -n openshift-operators-redhat |more
...(생략)
{"_ts":"2023-08-01T12:06:35.919878649Z","_level":"0","_component":"elasticsearch-operator_controllers_Kibana","_message":"Registering future events","name":{"Namespace":"openshift-loggi
ng","Name":"kibana"}}
{"_ts":"2023-08-01T12:06:57.139310355Z","_level":"0","_component":"elasticsearch-operator_controllers_Elasticsearch","_message":"Updated Elasticsearch","cluster":"elasticsearch","namesp
ace":"openshift-logging","retries":0}
{"_ts":"2023-08-01T12:07:08.266232743Z","_level":"0","_component":"elasticsearch-operator","_message":"Reconciler error","Kibana":{"name":"kibana","namespace":"openshift-logging"},"_err
or":{"cause":{"cause":{"ErrStatus":{"metadata":{},"status":"Failure","message":"ConfigMap \"kibana-trusted-ca-bundle\" not found","reason":"NotFound","details":{"name":"kibana-trusted-c
a-bundle","kind":"ConfigMap"},"code":404}},"msg":"failed to get configmap","name":"kibana-trusted-ca-bundle","namespace":"openshift-logging"},"cluster":"kibana","msg":"failed to get tru
sted CA bundle config map"},"controller":"kibana-controller","controllerGroup":"logging.openshift.io","controllerKind":"Kibana","name":"kibana","namespace":"openshift-logging","reconcil
eID":"65b185a2-8ff3-4e35-ad6d-db8e21fda738"}
{"_ts":"2023-08-01T12:07:08.311912381Z","_level":"0","_component":"elasticsearch-operator","_message":"Reconciler error","Kibana":{"name":"kibana","namespace":"openshift-logging"},"_err
or":{"cause":{"msg":"ImageStream tag contains no images","name":"oauth-proxy","namespace":"openshift","tag":"v4.4"},"msg":"Failed to get oauth-proxy image"},"controller":"kibana-control
ler","controllerGroup":"logging.openshift.io","controllerKind":"Kibana","name":"kibana","namespace":"openshift-logging","reconcileID":"2779a52b-491f-46cd-9c9d-703666889eb4"}

oauth-proxy Imagestream를 확인하여 인증서가 정상인지 확인합니다.

확인한 결과 Image Registry에 대한 CA 인증서 에러가 발생한 것을 확인하였습니다.

#  oc get -o yaml is/oauth-proxy -n openshift
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
  name: oauth-proxy
  namespace: openshift
..(생략)
status:
  dockerImageRepository: ""
  tags:
  - conditions:
    - generation: 617
      lastTransitionTime: "2023-08-01T13:08:33Z"
      message: 'Internal error occurred: [harbor.example.com:443/ocp4/openshift4@sha256:330d1bc787a8c84b3f7b02f50fb9be2e840361aefb1580d70ff75cb1f73a4a15:
        Get "https://harbor.example.com:443/v2/": x509: certificate signed by unknown
        authority, quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:330d1bc787a8c84b3f7b02f50fb9be2e840361aefb1580d70ff75cb1f73a4a15:
        Get "https://quay.io/v2/": dial tcp 44.194.1.249:443: connect: connection
        refused]'
      reason: InternalError
      status: "False"
      type: ImportSuccess
    items: null
    tag: v4.4

cluster의 proxy 설정에 CA 인증서가 포함되어 있는지 확인합니다.

spec에 정의된 것이 없는 것을 확인하였고 클러스터에 생성되어 있는 "user-ca-bundle" configmap을 trustedCA로 정의합니다.

# oc get proxy cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Proxy
metadata:
  creationTimestamp: "2023-07-25T07:30:44Z"
  generation: 4
  name: cluster
  resourceVersion: "3331271"
  uid: 4ceb82ab-7980-4384-b425-27baa08e4980
spec: {}
status: {}

"user-ca-bundle" configmap을 조회합니다. 

install-config.yaml에 사용하였던 "additionalTrustBundle" 즉, Image Registry 인증서의 내용이 확인되었습니다.

# oc get cm user-ca-bundle -n openshift-config -o json |jq .data
{
  "ca-bundle.crt": "-----BEGIN CERTIFICATE-----\BsZTE...(생략)\n-----END CERTIFICATE-----\n"
}

앞에서 확인한 "user-ca-bundle" configmap을 proxy에 추가하는 작업을 진행합니다.

"user-ca-bundle" configmap 이 trustedCA로 정의된 것을 확인하였습니다.

# oc patch proxy/cluster --type merge -p '{"spec":{"trustedCA":{"name": "user-ca-bundle"}}}'


<trustedCA 추가 확인>
# oc get proxy cluster -o json |jq .spec
{
  "trustedCA": {
    "name": "user-ca-bundle"
  }
}

앞에서 문제가 있던 oauth-proxy Imagestream을 삭제하고 확인합니다. (몇 분 기다리면 다시 생성됩니다. 절대 수동으로 생성하지 마세요.)

# oc delete is -n openshift oauth-proxy


<CA 인증서 에러 해결 확인>
# oc get is -n openshift oauth-proxy -o json |jq .status
{
  "dockerImageRepository": "",
  "tags": [
    {
      "items": [
        {
          "created": "2023-08-01T13:24:11Z",
          "dockerImageReference": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:330d1bc787a8c84b3f7b02f50fb9be2e840361aefb1580d70ff75cb1f73a4a15",
          "generation": 2,
          "image": "sha256:330d1bc787a8c84b3f7b02f50fb9be2e840361aefb1580d70ff75cb1f73a4a15"
        }
      ],
      "tag": "v4.4"
    }
  ]
}

Kibana pod가 실행중인지 확인합니다.

# oc get pod -n openshift-logging
NAME                                            READY   STATUS      RESTARTS   AGE
cluster-logging-operator-6f7d96cfb5-gvv84       1/1     Running     0          88m
collector-29d4x                                 2/2     Running     0          52m
collector-j4hk8                                 2/2     Running     0          52m
collector-n6nd8                                 2/2     Running     0          52m
collector-qdsqn                                 2/2     Running     0          52m
collector-r565w                                 2/2     Running     0          52m
collector-tm7mt                                 2/2     Running     0          52m
collector-xl47z                                 2/2     Running     0          52m
collector-xzkvh                                 2/2     Running     0          52m
elasticsearch-cdm-ihn3yrdv-1-85cdbd7b4f-mgrtl   2/2     Running     0          74m
elasticsearch-cdm-ihn3yrdv-2-6674db8445-dsk2l   2/2     Running     0          74m
elasticsearch-cdm-ihn3yrdv-3-6997c99649-hmtkv   2/2     Running     0          74m
elasticsearch-im-app-28181640-sq9m9             0/1     Completed   0          10m
elasticsearch-im-audit-28181640-7vl75           0/1     Completed   0          10m
elasticsearch-im-infra-28181640-8nfwp           0/1     Completed   0          10m
kibana-64985c788c-lg4fz                         2/2     Running     0          46m

마지막으로 Route를 조회해서 Kibana 콘솔로 접근할 수 있는 주소를 조회합니다.

대부분의 사용자가 같은 대역대의 PC에서 접속하지 않기 때문에 윈도우의 hosts 파일에 아래의 route 주소를 기입하여야합니다. 

EX)  <IP 주소>   kibana-openshift-logging.apps.ocp4.example.com

# oc get route -n openshift-logging
NAME     HOST/PORT                                        PATH   SERVICES   PORT    TERMINATION          WILDCARD
kibana   kibana-openshift-logging.apps.ocp4.example.com          kibana     <all>   reencrypt/Redirect   None

kibana 콘솔 접속화면

 

반응형