Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secure connection to drydock is failing. #212

Closed
nagajagan opened this issue Apr 8, 2022 · 8 comments
Closed

Secure connection to drydock is failing. #212

nagajagan opened this issue Apr 8, 2022 · 8 comments
Labels
bug Something isn't working triage

Comments

@nagajagan
Copy link

Describe the bug
Installing Drydock Boot Actions.start is failing.

Steps To Reproduce
Maintain treasurmap version @ 2227df4 and follow the steps to bring up genesis node.

Expected behavior
Drydock should complete deployment of nodes.

Environment

  • Treasuremap version: 2227df4
  • Treasuremp site type: cruiser
  • Airshipctl version:
  • Operating system: 18.04.6 LTS (Bionic Beaver)
  • Kernel version: 4.15.0-167-generic
  • Kubernetes version: v1.17.2
  • Go version:
  • Hypervisor level 0 (if applicable):
  • Hardware specs (e.g. 4 vCPUs, 16GB RAM, bare metal vs VM):

Detailed logs within drydock
`Installing Drydock Boot Actions.start: cmd-install/stage-late/drydock_01/cmd-in-target: curtin command in-target

Running command ['mount', '--bind', '/dev', '/tmp/tmpt3f8gvqn/target/dev'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/proc', '/tmp/tmpt3f8gvqn/target/proc'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/run', '/tmp/tmpt3f8gvqn/target/run'] with allowed return codes [0] (capture=False)
Running command ['mount', '--bind', '/sys', '/tmp/tmpt3f8gvqn/target/sys'] with allowed return codes [0] (capture=False)

Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpt3f8gvqn/target', 'wget', '--no-proxy', '--no-check-certificate', '--header=X-Bootaction-Key: e27bba27178686a0112252ab215042a4a85a3aa76978be5b2d3cba845c770491', 'https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc19/units', '-O', '/tmp/bootaction-units.tar.gz'] with allowed return codes [0] (capture=False)

--2022-04-07 14:47:04-- https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc19/units

Resolving drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)... 10.109.82.10

Connecting to drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)|10.109.82.10|:443... connected.
WARNING: cannot verify drydock-nc.att-5gcore.bete.ericy.com's certificate, issued by ‘CN=Kubernetes Ingress Controller Fake Certificate,O=Acme Co’:

Unable to locally verify the issuer's authority.WARNING: no certificate subject alternative name matches
requested host name ‘drydock-nc.att-5gcore.bete.ericy.com’.HTTP request sent, awaiting response... 404 Not Found

2022-04-07 14:47:04 ERROR 404: Not Found.Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)TIMED subp(['udevadm', 'settle']): 0.010`

@nagajagan nagajagan added bug Something isn't working triage labels Apr 8, 2022
@jasvinder1107
Copy link
Contributor

jasvinder1107 commented Apr 15, 2022

If you see from the error itself, the ingress is taking the FAKE certs, which essentially means that cert were not generated by promenade while installation was done. If ingress is not provided with the valid internal certs generated by below command, the fqdn of ingress will resolve to fake cert and installation will not behave as expected.

mkdir ${NEW_SITE}_certs
sudo tools/airship promenade generate-certs
-o /target/${NEW_SITE}_certs /target/${NEW_SITE}_collected/*.yaml

mkdir -p site/${NEW_SITE}/secrets/certificates
sudo cp ${NEW_SITE}_certs/certificates.yaml
site/${NEW_SITE}/secrets/certificates/certificates.yaml

@nagajagan
Copy link
Author

site/xxxxx/secrets/certificates/ingress.yaml, ingress-crt-site to have following content and that should solve the problem.

-----BEGIN CERTIFICATE-----
Ingress Certificates
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
Intermediate Certificate
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
Root certificate
-----END CERTIFICATE-----

@jasvinder1107
Copy link
Contributor

Just to clarify for future audience. The cert chain is required to be installed in the ingress.yaml. If not properly installed, the call from client to ingress is going to fail with ssl code 21. The error means couldn’t verify the certificate. Please check for public certs in the ingress definition for corresponding services.

@nagajagan
Copy link
Author

Including certificate chain in the ingress.yaml didn't solve the problem of drydock connectivity. It only solved the shipyard connectivity problem.

@jasvinder1107
Copy link
Contributor

The dns for drydock should resolve to ingress-nc not ingress-uc starting from 2.7. Please correct the dns entry and you should be able to fix this thing.

@nagajagan
Copy link
Author

45c6953d-c0df-42e0-9e60-c75df5e88186

After pointing drydock-nc to ingress-nc that is the issue we observe on controller IDRAC consoles while PXE booting.
That is not caused by firewall.
What default routes do you suggest to change?

@nagajagan
Copy link
Author

nagajagan commented May 13, 2022

Logs from from MaaS GUI

Stdout: start: cmd-install/stage-late/drydock_02/cmd-in-target: curtin command in-target
        Running command ['mount', '--bind', '/dev', '/tmp/tmpdi62xy0q/target/dev'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/proc', '/tmp/tmpdi62xy0q/target/proc'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/run', '/tmp/tmpdi62xy0q/target/run'] with allowed return codes [0] (capture=False)
        Running command ['mount', '--bind', '/sys', '/tmp/tmpdi62xy0q/target/sys'] with allowed return codes [0] (capture=False)
        Running command ['unshare', '--help'] with allowed return codes [0] (capture=True)
        Running command ['unshare', '--fork', '--pid', '--', 'chroot', '/tmp/tmpdi62xy0q/target', 'wget', '--no-proxy', '--no-check-certificate', '--header=X-Bootaction-Key: ae631ad31b0bdbe53601f4da35375040bac0bc446a245858f2b33d759ae101df', 'https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files', '-O', '/tmp/bootaction-files.tar.gz'] with allowed return codes [0] (capture=False)
        --2022-05-10 17:10:52--  https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files
        Resolving drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)... 10.109.84.189
        Connecting to drydock-nc.att-5gcore.bete.ericy.com (drydock-nc.att-5gcore.bete.ericy.com)|10.109.84.189|:443... connected.
        HTTP request sent, awaiting response... 500 Internal Server Error
        2022-05-10 17:12:29 ERROR 500: Internal Server Error.       

        Running command ['udevadm', 'settle'] with allowed return codes [0] (capture=False)
        TIMED subp(['udevadm', 'settle']): 0.010
        Running command ['umount', '/tmp/tmpdi62xy0q/target/sys'] with allowed return codes [0] (capture=False)
        Running command ['umount', '/tmp/tmpdi62xy0q/target/run'] with allowed return codes [0] (capture=False)
        Running command ['umount', '/tmp/tmpdi62xy0q/target/proc'] with allowed return codes [0] (capture=False
        Running command ['umount', '/tmp/tmpdi62xy0q/target/dev'] with allowed return codes [0] (capture=False)
        finish: cmd-install/stage-late/drydock_02/cmd-in-target: FAIL: curtin command in-target
        

Stderr: ''

Same service called from curl

root@att5gc20:~# curl --header "X-Bootaction-Key: ae631ad31b0bdbe53601f4da35375040bac0bc446a245858f2b33d759ae101df" https://drydock-nc.att-5gcore.bete.ericy.com/api/v1.0/bootactions/nodes/att5gc18/files
{"title": "Error when running bootaction pipeline segment utf8_decode: AttributeError - 'NoneType' object has no attribute 'decode'"}

We don't see any logging information within drydock pods to find the root cause of this issue.

@nagajagan
Copy link
Author

Initial issue is fixed by adding proper routings in the environment.

#212 (comment) is addressed by with the right version of the image for promenade and tested by the reporter.

       promenade:
         location: https://opendev.org/airship/promenade
-        reference: 27f181a9d30294030d695b747b2e4560ffbd29be
+        reference: d161528ae8de0dcb0dd9d39bc370f85f2aa1c462
         subpath: charts/promenade
         type: git

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants