Notes on setting up Dockerized LDAP for new protocol.club host

Table of Contents
_________________

1 Notes on setting up Dockerized LDAP for new protocol.club host
.. 1.1 Notes [2019-04-04 Thu]
..... 1.1.1 Plan for tonight
..... 1.1.2 DNS Complication
..... 1.1.3 Manual Setup
..... 1.1.4 Memory issues
.. 1.2 Notes [2019-04-05 Fri]
..... 1.2.1 Memory issues cont.
..... 1.2.2 Plan - rebuild image with ulimit fix
..... 1.2.3 Observation: Service doesn't stop cleanly
..... 1.2.4 Alternative option: set ulimit from docker run
.. 1.3 Notes [2019-04-05 Fri]
..... 1.3.1 Fixed memory issue
..... 1.3.2 Also fixing the way the service stops the container


1 Notes on setting up Dockerized LDAP for new protocol.club host
================================================================

  Probably use something like this:
  [https://github.com/hashbang/docker-slapd]

  Maybe using Ansible to configure the host itself, and a systemd unit
  to run that container...?

  Possibly borrowing from: [https://github.com/hashbang/admin-tools/]


1.1 Notes [2019-04-04 Thu]
~~~~~~~~~~~~~~~~~~~~~~~~~~

1.1.1 Plan for tonight
----------------------

  Created snapshot of un-configured host

  Plan is to do setup manually at first, and then to automate, probably
  with Ansible


1.1.2 DNS Complication
----------------------

  First complication: can't obtain certs from letsencrypt without DNS
  for ldap.protocol.club

  DNS still hosted by gandi

  Tried to transfer to Route 53 but transfer is not allowed because
  "protocol.club" is listed as a "premium" domain for the .club and
  Route 53 can't take it

  Tried to set up new record in Gandi, but that first required upgrading
  to their new DNS platform, which required changing nameservers

  Nameserver change could take 12-24 hours to take effect

  Can resolve new ldap.protocol.club record directly from new
  nameservers, but that change hasn't propagated yet

  DNS looks to have propagated now.

  Needed to clear negative cache from systemd's local resolver with:

  ,----
  | systemd-resolve --flush-caches
  `----


1.1.3 Manual Setup
------------------

  Following the setup instructions from the README:
  [https://github.com/hashbang/docker-slapd/blob/eb9243f9f39b61e475345a22819d0dcbaeef8439/README.md]

  Created volumes:

  ,----
  | docker volume create slapd-config
  | docker volume create slapd-data
  | docker volume create slapd-ssl
  `----

  Generated certs using LetsEncrypt:

  ,----
  | docker run \
  |  --cap-add=NET_ADMIN \
  |  --name=letsencrypt \
  |  -v slapd-ssl:/config \
  |  -e EMAIL=mike.english@gmail.com \
  |  -e URL=ldap.protocol.club \
  |  -e VALIDATION=http \
  |  -p 80:80 \
  |  -e TZ=UTC \
  |  linuxserver/letsencrypt
  `----

  Mounted the volume to rename the files:

  ,----
  | docker run -v slapd-ssl:/config -ti debian:jessie bash
  `----

  And renamed them:

  ,----
  | # In container:
  | cd config
  | cp etc/letsencrypt/live/ldap.protocol.club/cert.pem ldap.crt
  | cp etc/letsencrypt/live/ldap.protocol.club/privkey.pem ldap.key
  | cp etc/letsencrypt/live/ldap.protocol.club/fullchain.pem ca.cert
  `----

  Copied `systemd' unit file into place (from checkout of the Git repo):
  ,----
  | cp docker-slapd.service /etc/systemd/system/
  `----

  Generated a new password to use for LDAP administration, stored it in
  1Password

  Made a backup of the docker volumes (just the certs so far) before
  trying to bootstrap the slapd install:

  ,----
  | sudo tar \
  |  -cvpzf ldap-backup.tar.gz \
  |  --exclude=/backup.tar.gz \
  |  --one-file-system /var/lib/docker/volumes/
  | mv ldap-backup.tar.gz ldap-backup-certsonly.tar.gz 
  `----

  Bootstrapping slapd:
  ,----
  | /usr/bin/docker pull hashbang/slapd
  | /usr/bin/docker run \
  |  -p 389:389 -p 636:636 \
  |  -v slapd-ssl:/etc/ldap/ssl \
  |  -v slapd-data:/var/lib/ldap \
  |  -v slapd-config:/etc/ldap/slapd.d \
  |  -e ADMIN_PASS=REDACTED \
  |  -e ROOT_PASS=REDACTED \
  |  -e DOMAIN=protocol.club \
  |  -e ORG=protocol.club \
  |  --name="slapd" \
  |  hashbang/slapd
  `----

  I forgot to connect in a reasonable way, so when that looked finished,
  I had to open another shell to kill it:
  ,----
  | root@new:~# docker ps
  | CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS                                        NAMES
  | 0bdb07335b9a        hashbang/slapd      "bash /tmp/run.sh"   2 minutes ago       Up 2 minutes        0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp   slapd
  | root@new:~# docker kill 0bdb07335b9a
  `----

  When I first tried to start the service, docker locked up in a bad way
  and would segfault whenever I tried to run `docker ps'.

  I tried to fix this by stopping the service, but that hung. I stepped
  a way for a while, and my SSH session died.

  When I reconnected, I found that `docker ps' was working again, and
  that the service had started successfully.

  This seems like it might be a decent smoke test, run remotely:

  ,----
  | $ ldapwhoami -h ldap.protocol.club -D cn=admin,dc=protocol,dc=club -W
  | Enter LDAP Password: 
  | dn:cn=admin,dc=protocol,dc=club
  `----

  It looks like I have a running slapd server now.


1.1.4 Memory issues
-------------------

  It looked as though slapd was working properly, for a time, but
  behavior was very erratic around restarts.

  I noticed in the journal that mallocs were failing and the service was
  failing to start as a result.

  I noticed that the memory usage was extremely high. At one point I
  could no longer start the service. I rebooted the host altogether.

  When the host came back up, slapd was able to start, but it consumed
  700+MB of memory.

  After some searching, it appears that this is some kind of known
  issue:

  - [https://discuss.linuxcontainers.org/t/empty-openldap-slapd-consuming-800-mb-memory-on-lxc-solved/1022]
  - [https://github.com/nickstenning/docker-slapd/issues/8]
  - [https://github.com/moby/moby/issues/8231]

  It would seem that perhaps, we can address this issue by limiting the
  number of open files allowed to the process...?


1.2 Notes [2019-04-05 Fri]
~~~~~~~~~~~~~~~~~~~~~~~~~~

1.2.1 Memory issues cont.
-------------------------

  It looks like an attempt was made to set `ulimit -n 1024' in this
  container's `run.sh':
  [https://github.com/hashbang/docker-slapd/blob/eb9243f9f39b61e475345a22819d0dcbaeef8439/run.sh#L16]

  But when I look at the running process, it doesn't seem to have
  worked:

  ,----
  | root@new:~# cat /proc/$(pidof slapd)/limits | grep files
  | Max open files            1048576              1048576              files     
  `----

  On the other hand, this instance of the service is still running from
  last night, which is a good sign that it may be stable if I can solve
  the memory issue:

  ,----
  | root@new:~# docker ps
  | CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS                                        NAMES
  | 0c3aa40f7adc        hashbang/slapd      "bash /tmp/run.sh"   8 hours ago         Up 8 hours          0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp   slapd
  | root@new:~# free -m
  |               total        used        free      shared  buff/cache   available
  | Mem:            985         873          61           0          50          16
  | Swap:             0           0           0
  | root@new:~# 
  `----


1.2.2 Plan - rebuild image with ulimit fix
------------------------------------------

  I should be able to build my own copy of the image used by this
  service from the Dockerfile in the repo and modify it such that the
  file limit change works


1.2.3 Observation: Service doesn't stop cleanly
-----------------------------------------------

  This is concerning, and makes me question how this whole service has
  been put together.

  I was not able to stop the service cleanly using `systemctl':

  ,----
  | root@new:~/docker-slapd# systemctl stop docker-slapd.service                                                                                                                    
  | root@new:~/docker-slapd# docker ps
  | CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS                                        NAMES
  | 0c3aa40f7adc        hashbang/slapd      "bash /tmp/run.sh"   8 hours ago         Up 8 hours          0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp   slapd
  | root@new:~/docker-slapd# systemctl status docker-slapd.service                                                                                                                   
  | ● docker-slapd.service - #! LDAP Server
  |   Loaded: loaded (/etc/systemd/system/docker-slapd.service; enabled; vendor preset: enabled)
  |   Active: failed (Result: timeout) since Fri 2019-04-05 12:45:46 UTC; 3min 11s ago
  |   Process: 1773 ExecStart=/usr/bin/docker run -p 389:389 -p 636:636 -v slapd-ssl:/etc/ldap/ssl -v slapd-data:/var/lib/ldap -v slapd-config:/etc/ldap/slapd.d --name=slapd hashban
  |   Process: 1767 ExecStartPre=/usr/bin/docker rm slapd (code=exited, status=0/SUCCESS)
  |   Process: 1761 ExecStartPre=/usr/bin/docker kill slapd (code=exited, status=1/FAILURE)
  |   Process: 1733 ExecStartPre=/usr/bin/docker pull hashbang/slapd (code=exited, status=0/SUCCESS)
  | Main PID: 1773 (code=killed, signal=KILL)
  | 
  | Apr 05 04:56:05 new docker[1773]: SASL username: gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth
  | Apr 05 04:56:05 new docker[1773]: SASL SSF: 0
  | Apr 05 04:56:05 new docker[1773]: modifying entry "cn=config"
  | Apr 05 12:44:16 new systemd[1]: Stopping #! LDAP Server...
  | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: State 'stop-sigterm' timed out. Killing.
  | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Killing process 1773 (docker) with signal SIGKILL.
  | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Killing process 1937 (docker) with signal SIGKILL.
  | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Main process exited, code=killed, status=9/KILL
  | Apr 05 12:45:46 new systemd[1]: docker-slapd.service: Failed with result 'timeout'.
  | Apr 05 12:45:46 new systemd[1]: Stopped #! LDAP Server.
  | root@new:~/docker-slapd# docker ps
  | CONTAINER ID        IMAGE               COMMAND              CREATED             STATUS              PORTS                                        NAMES
  | 0c3aa40f7adc        hashbang/slapd      "bash /tmp/run.sh"   8 hours ago         Up 8 hours          0.0.0.0:389->389/tcp, 0.0.0.0:636->636/tcp   slapd
  | root@new:~/docker-slapd# pgrep slapd
  | 1882
  | root@new:~/docker-slapd# 
  `----

  I'm crossing my fingers that this disagreeable behavior might be the
  result of operating in a low free memory environment


1.2.4 Alternative option: set ulimit from docker run
----------------------------------------------------

  It looks like we may be able to set the file limit from our `docker
  run' invocation as well:

  [https://docs.docker.com/engine/reference/commandline/run/#set-ulimits-in-container---ulimit]

        Since setting ulimit settings in a container requires
        extra privileges not available in the default container,
        you can set these using the `--ulimit' flag. `--ulimit' is
        specified with a soft and hard limit as such:
        `<type>=<soft limit>[:<hard limit>]', for example:

        ,----
        | $ docker run --ulimit nofile=1024:1024 --rm debian sh -c "ulimit -n"
        | 1024
        `----


1.3 Notes [2019-04-05 Fri]
~~~~~~~~~~~~~~~~~~~~~~~~~~

1.3.1 Fixed memory issue
------------------------

  Modified the systemd unit to set ulimit for nofile in the `docker run'
  invocation

  ,----
  | [Install]
  | WantedBy=multi-user.target
  | 
  | [Unit]
  | Description=#! LDAP Server
  | After=docker.service
  | Requires=docker.service
  | 
  | [Service]
  | Restart=always
  | ExecStartPre=-/usr/bin/docker pull hashbang/slapd
  | ExecStartPre=-/usr/bin/docker kill slapd
  | ExecStartPre=-/usr/bin/docker rm slapd
  | ExecStart=/usr/bin/docker run \
  |   --ulimit nofile=1024 \
  |   -p 389:389 -p 636:636 \
  |   -v slapd-ssl:/etc/ldap/ssl \
  |   -v slapd-data:/var/lib/ldap \
  |   -v slapd-config:/etc/ldap/slapd.d \
  |   --name="slapd" \
  |   hashbang/slapd
  | 
  | [Install]
  | WantedBy=multi-user.target
  `----


1.3.2 Also fixing the way the service stops the container
---------------------------------------------------------

  Added this line:
  ,----
  | ExecStop=/usr/bin/docker stop slapd
  `----

  Full unit now looks like:

  ,----
  | [Install]
  | WantedBy=multi-user.target
  | 
  | [Unit]
  | Description=#! LDAP Server
  | After=docker.service
  | Requires=docker.service
  | 
  | [Service]
  | Restart=always
  | ExecStartPre=-/usr/bin/docker pull hashbang/slapd
  | ExecStartPre=-/usr/bin/docker kill slapd
  | ExecStartPre=-/usr/bin/docker rm slapd
  | ExecStart=/usr/bin/docker run \
  |   --ulimit nofile=1024 \
  |   -p 389:389 -p 636:636 \
  |   -v slapd-ssl:/etc/ldap/ssl \
  |   -v slapd-data:/var/lib/ldap \
  |   -v slapd-config:/etc/ldap/slapd.d \
  |   --name="slapd" \
  |   hashbang/slapd
  | ExecStop=/usr/bin/docker stop slapd
  | 
  | [Install]
  | WantedBy=multi-user.target
  `----