Notes on setting up Dockerized LDAP for new protocol.club host Table of Contents _________________ 1 Notes on setting up Dockerized LDAP for new protocol.club host .. 1.1 Notes [2019-04-04 Thu] ..... 1.1.1 Plan for tonight ..... 1.1.2 DNS Complication ..... 1.1.3 Manual Setup ..... 1.1.4 Memory issues .. 1.2 Notes [2019-04-05 Fri] ..... 1.2.1 Memory issues cont. ..... 1.2.2 Plan - rebuild image with ulimit fix ..... 1.2.3 Observation: Service doesn't stop cleanly ..... 1.2.4 Alternative option: set ulimit from docker run .. 1.3 Notes [2019-04-05 Fri] ..... 1.3.1 Fixed memory issue ..... 1.3.2 Also fixing the way the service stops the container 1 Notes on setting up Dockerized LDAP for new protocol.club host ================================================================ Probably use something like this: [https://github.com/hashbang/docker-slapd] Maybe using Ansible to configure the host itself, and a systemd unit to run that container...? Possibly borrowing from: [https://github.com/hashbang/admin-tools/] 1.1 Notes [2019-04-04 Thu] ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.1.1 Plan for tonight ---------------------- Created snapshot of un-configured host Plan is to do setup manually at first, and then to automate, probably with Ansible 1.1.2 DNS Complication ---------------------- First complication: can't obtain certs from letsencrypt without DNS for ldap.protocol.club DNS still hosted by gandi Tried to transfer to Route 53 but transfer is not allowed because "protocol.club" is listed as a "premium" domain for the .club and Route 53 can't take it Tried to set up new record in Gandi, but that first required upgrading to their new DNS platform, which required changing nameservers Nameserver change could take 12-24 hours to take effect Can resolve new ldap.protocol.club record directly from new nameservers, but that change hasn't propagated yet DNS looks to have propagated now. Needed to clear negative cache from systemd's local resolver with: ,---- | systemd-resolve --flush-caches `---- 1.1.3 Manual Setup ------------------ Following the setup instructions from the README: [https://github.com/hashbang/docker-slapd/blob/eb9243f9f39b61e475345a22819d0dcbaeef8439/README.md] Created volumes: ,---- | docker volume create slapd-config | docker volume create slapd-data | docker volume create slapd-ssl `---- Generated certs using LetsEncrypt: ,---- | docker run \ | --cap-add=NET_ADMIN \ | --name=letsencrypt \ | -v slapd-ssl:/config \ | -e EMAIL=mike.english@gmail.com \ | -e URL=ldap.protocol.club \ | -e VALIDATION=http \ | -p 80:80 \ | -e TZ=UTC \ | linuxserver/letsencrypt `---- Mounted the volume to rename the files: ,---- | docker run -v slapd-ssl:/config -ti debian:jessie bash `---- And renamed them: ,---- | # In container: | cd config | cp etc/letsencrypt/live/ldap.protocol.club/cert.pem ldap.crt | cp etc/letsencrypt/live/ldap.protocol.club/privkey.pem ldap.key | cp etc/letsencrypt/live/ldap.protocol.club/fullchain.pem ca.cert `---- Copied `systemd' unit file into place (from checkout of the Git repo): ,---- | cp docker-slapd.service /etc/systemd/system/ `---- Generated a new password to use for LDAP administration, stored it in 1Password Made a backup of the docker volumes (just the certs so far) before trying to bootstrap the slapd install: ,---- | sudo tar \ | -cvpzf ldap-backup.tar.gz \ | --exclude=/backup.tar.gz \ | --one-file-system /var/lib/docker/volumes/ | mv ldap-backup.tar.gz ldap-backup-certsonly.tar.gz `---- Bootstrapping slapd: ,---- | /usr/bin/docker pull hashbang/slapd | /usr/bin/docker run \ | -p 389:389 -p 636:636 \ | -v slapd-ssl:/etc/ldap/ssl \ | -v slapd-data:/var/lib/ldap \ | -v slapd-config:/etc/ldap/slapd.d \ | -e ADMIN_PASS=REDACTED \ | -e ROOT_PASS=REDACTED \ | -e DOMAIN=protocol.club \ | -e ORG=protocol.club \ | --name="slapd" \ | hashbang/slapd `---- I forgot to connect in a reasonable way, so when that looked finished, I had to open another shell to kill it: ,---- | root@new:~# docker ps | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | 0bdb07335b9a hashbang/slapd "bash /tmp/run.sh" 2 minutes ago Up 2 minutes 0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp slapd | root@new:~# docker kill 0bdb07335b9a `---- When I first tried to start the service, docker locked up in a bad way and would segfault whenever I tried to run `docker ps'. I tried to fix this by stopping the service, but that hung. I stepped a way for a while, and my SSH session died. When I reconnected, I found that `docker ps' was working again, and that the service had started successfully. This seems like it might be a decent smoke test, run remotely: ,---- | $ ldapwhoami -h ldap.protocol.club -D cn=admin,dc=protocol,dc=club -W | Enter LDAP Password: | dn:cn=admin,dc=protocol,dc=club `---- It looks like I have a running slapd server now. 1.1.4 Memory issues ------------------- It looked as though slapd was working properly, for a time, but behavior was very erratic around restarts. I noticed in the journal that mallocs were failing and the service was failing to start as a result. I noticed that the memory usage was extremely high. At one point I could no longer start the service. I rebooted the host altogether. When the host came back up, slapd was able to start, but it consumed 700+MB of memory. After some searching, it appears that this is some kind of known issue: - [https://discuss.linuxcontainers.org/t/empty-openldap-slapd-consuming-800-mb-memory-on-lxc-solved/1022] - [https://github.com/nickstenning/docker-slapd/issues/8] - [https://github.com/moby/moby/issues/8231] It would seem that perhaps, we can address this issue by limiting the number of open files allowed to the process...? 1.2 Notes [2019-04-05 Fri] ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.2.1 Memory issues cont. ------------------------- It looks like an attempt was made to set `ulimit -n 1024' in this container's `run.sh': [https://github.com/hashbang/docker-slapd/blob/eb9243f9f39b61e475345a22819d0dcbaeef8439/run.sh#L16] But when I look at the running process, it doesn't seem to have worked: ,---- | root@new:~# cat /proc/$(pidof slapd)/limits | grep files | Max open files 1048576 1048576 files `---- On the other hand, this instance of the service is still running from last night, which is a good sign that it may be stable if I can solve the memory issue: ,---- | root@new:~# docker ps | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | 0c3aa40f7adc hashbang/slapd "bash /tmp/run.sh" 8 hours ago Up 8 hours 0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp slapd | root@new:~# free -m | total used free shared buff/cache available | Mem: 985 873 61 0 50 16 | Swap: 0 0 0 | root@new:~# `---- 1.2.2 Plan - rebuild image with ulimit fix ------------------------------------------ I should be able to build my own copy of the image used by this service from the Dockerfile in the repo and modify it such that the file limit change works 1.2.3 Observation: Service doesn't stop cleanly ----------------------------------------------- This is concerning, and makes me question how this whole service has been put together. I was not able to stop the service cleanly using `systemctl': ,---- | root@new:~/docker-slapd# systemctl stop docker-slapd.service | root@new:~/docker-slapd# docker ps | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | 0c3aa40f7adc hashbang/slapd "bash /tmp/run.sh" 8 hours ago Up 8 hours 0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp slapd | root@new:~/docker-slapd# systemctl status docker-slapd.service | ● docker-slapd.service - #! LDAP Server | Loaded: loaded (/etc/systemd/system/docker-slapd.service; enabled; vendor preset: enabled) | Active: failed (Result: timeout) since Fri 2019-04-05 12:45:46 UTC; 3min 11s ago | Process: 1773 ExecStart=/usr/bin/docker run -p 389:389 -p 636:636 -v slapd-ssl:/etc/ldap/ssl -v slapd-data:/var/lib/ldap -v slapd-config:/etc/ldap/slapd.d --name=slapd hashban | Process: 1767 ExecStartPre=/usr/bin/docker rm slapd (code=exited, status=0/SUCCESS) | Process: 1761 ExecStartPre=/usr/bin/docker kill slapd (code=exited, status=1/FAILURE) | Process: 1733 ExecStartPre=/usr/bin/docker pull hashbang/slapd (code=exited, status=0/SUCCESS) | Main PID: 1773 (code=killed, signal=KILL) | | Apr 05 04:56:05 new docker[1773]: SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth | Apr 05 04:56:05 new docker[1773]: SASL SSF: 0 | Apr 05 04:56:05 new docker[1773]: modifying entry "cn=config" | Apr 05 12:44:16 new systemd[1]: Stopping #! LDAP Server... | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: State 'stop-sigterm' timed out. Killing. | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Killing process 1773 (docker) with signal SIGKILL. | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Killing process 1937 (docker) with signal SIGKILL. | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Main process exited, code=killed, status=9/KILL | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Failed with result 'timeout'. | Apr 05 12:45:46 new systemd[1]: Stopped #! LDAP Server. | root@new:~/docker-slapd# docker ps | CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES | 0c3aa40f7adc hashbang/slapd "bash /tmp/run.sh" 8 hours ago Up 8 hours 0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp slapd | root@new:~/docker-slapd# pgrep slapd | 1882 | root@new:~/docker-slapd# `---- I'm crossing my fingers that this disagreeable behavior might be the result of operating in a low free memory environment 1.2.4 Alternative option: set ulimit from docker run ---------------------------------------------------- It looks like we may be able to set the file limit from our `docker run' invocation as well: [https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container---ulimit] Since setting ulimit settings in a container requires extra privileges not available in the default container, you can set these using the `--ulimit' flag. `--ulimit' is specified with a soft and hard limit as such: `<type>=<soft limit>[:<hard limit>]', for example: ,---- | $ docker run --ulimit nofile=1024:1024 --rm debian sh -c "ulimit -n" | 1024 `---- 1.3 Notes [2019-04-05 Fri] ~~~~~~~~~~~~~~~~~~~~~~~~~~ 1.3.1 Fixed memory issue ------------------------ Modified the systemd unit to set ulimit for nofile in the `docker run' invocation ,---- | [Install] | WantedBy=multi-user.target | | [Unit] | Description=#! LDAP Server | After=docker.service | Requires=docker.service | | [Service] | Restart=always | ExecStartPre=-/usr/bin/docker pull hashbang/slapd | ExecStartPre=-/usr/bin/docker kill slapd | ExecStartPre=-/usr/bin/docker rm slapd | ExecStart=/usr/bin/docker run \ | --ulimit nofile=1024 \ | -p 389:389 -p 636:636 \ | -v slapd-ssl:/etc/ldap/ssl \ | -v slapd-data:/var/lib/ldap \ | -v slapd-config:/etc/ldap/slapd.d \ | --name="slapd" \ | hashbang/slapd | | [Install] | WantedBy=multi-user.target `---- 1.3.2 Also fixing the way the service stops the container --------------------------------------------------------- Added this line: ,---- | ExecStop=/usr/bin/docker stop slapd `---- Full unit now looks like: ,---- | [Install] | WantedBy=multi-user.target | | [Unit] | Description=#! LDAP Server | After=docker.service | Requires=docker.service | | [Service] | Restart=always | ExecStartPre=-/usr/bin/docker pull hashbang/slapd | ExecStartPre=-/usr/bin/docker kill slapd | ExecStartPre=-/usr/bin/docker rm slapd | ExecStart=/usr/bin/docker run \ | --ulimit nofile=1024 \ | -p 389:389 -p 636:636 \ | -v slapd-ssl:/etc/ldap/ssl \ | -v slapd-data:/var/lib/ldap \ | -v slapd-config:/etc/ldap/slapd.d \ | --name="slapd" \ | hashbang/slapd | ExecStop=/usr/bin/docker stop slapd | | [Install] | WantedBy=multi-user.target `----