tadd readme; remove obsoleteness - amprolla - devuan's apt repo merger
git clone git://parazyd.org/amprolla.git
Log
Files
Refs
README
LICENSE
---
commit 1d9670ade4cc7c28dfd1c6de9bc14ca099be0c9d
parent 0454dba27c9b281b9eaca4b75184a8bc1f54cf15
Author: parazyd 
Date:   Mon,  5 Jun 2017 21:47:59 +0200

add readme; remove obsoleteness

Diffstat:
  A README.md                           |      23 +++++++++++++++++++++++
  D doc/dan-notes                       |     109 -------------------------------
  M orchestrate.py                      |       6 ++----

3 files changed, 25 insertions(+), 113 deletions(-)
---
diff --git a/README.md b/README.md
t@@ -0,0 +1,23 @@
+amprolla
+========
+
+amprolla is an apt repository merger originally intended for use with
+the [Devuan](https://devuan.org) infrastructure. This version is the
+third iteration of the software. The original version of amprolla was
+not performing well in terms of speed, and the second version was never
+finished - therefore this version has emerged.
+
+Dependencies
+------------
+
+### Devuan
+
+```
+gnupg2 python3-requests, python3-gnupg
+```
+
+### Gentoo:
+
+```
+app-crypt/gnupg dev-python/requests dev-python/python-gnupg
+```
diff --git a/doc/dan-notes b/doc/dan-notes
t@@ -1,109 +0,0 @@
-Ok... so the debian repo is essentially a directory heirarchy...
-
-Ok.. Do you understand the repo heirarchy?  ie the main folder (in
-amprolla case /merged) with sub folders 'dist' (for repo metadata) and
-'pool' (where the actual binary and source packages go)??
-forget about the "pool" folder, amprolla doesn't touch it...
-
-in "dists/" you have all the suites ie: jessie, ascii, ceres and all
-the and stable, unstable  and version symlinks.
-
-in the suite folder, you find the section folders: main contrib non-free
-and files InRelease, Release and Release.gpg
-
-InRelease is just the pgp/smime version of the Release file - the gpg
-sig is the same as Release.gpg
-
-Anyway the Release file basically is a dictionary of most of the files
-in the subdirectory with size and checksums (SHA256, SHA512 etc) in what
-is essentially RFC822 format, with a bunch of headers at the top that
-specify details about the Release of that suite.
-
-In the suite subdirectories you have a bunch of folders, binary-
-which contains the Packages file, and compressed copies of that, and a
-Release Stanza, and similar for the source folder with Sources file and
-compressed copies etc.
-
-the Contents files (currently not processed) are their too.
-(They contain a list of all the files in each package)
-
-their is also the i8n - folder which contains the processed files.
-oops s/processed files/translation files/
-
-
-Amprolla takes several mirrors and merges them in order of priority
-starting with the highest priority.  It firsts iterates over the structure
-to create it's repo structure, ie dists//
/ etc and then first -copies the highest priority mirror Packages and Sources files in and then for -the othermirrors iterates over the Packages and Sources files and compares -each package stanza for a match, and if there is a match on name then the highest -priority mirror version is kept, if not then the package is added in. -(This is where the inefficient model really shows up) - - -After all the new Source and Packages files are processed then the Release and -InRelease files are generated by walking the hierarchy and adding those files in. - -There is a lot of complexities, part of which is in the design of amprolla. -What I had started to do, and in describing it now, it seems obvious to me -I should probably have started pretty much from scratch is instead of this -iterative approach of compare and add or skip is keep a cache of each mirrors -last state, and then on each run create a delta between the last state and -current state. - - -* and how does dak integrate in all of this? -it doesn't. Dak is a standalone repository which just deals with the packages built by our CI -* so it's the same as any debian repo -Yup, slightly modified to handle our CI and some other tweaks -and I checked and our version is in gdo too. - - -anyway as I was saying about my approach re delta's: -There are big efficiencies in this approach. For starters, we only download the InRelease or -Release and Release.gpg file and after verifying it, compare to the previous state, and we -can use the delta generated to pick what files are new, changed or removed from the repo. -This means we only download the changed files in the repo for a start. And for the -Packages and Sources files we create a delta list of changed stanza's to apply. - -Instead of building the entire repo from scratch, we apply the delta -to a copy of our merged repo with handling for priority etc... - -What stumped me in the end is we actually should verify that we only have packages go in that -have a matching source stanza and we really need to process the contents and translations -at the same time. - -I suspect that nextime realised this which is why he started on amprolla2 which essentially -replicates dak + amprolla function... - -I just realised, I forgot to mention the overrides processing in amprolla. In the very -top of the dir in "merged/" is the "indices" folder that contains overrides. These -files specify for each Packages files, any metadata changes that need to be applied to -package stanza's - -In debian their is a entry for every single deb package/source in the archive making -them very large. We did away with that to reduce the overhead of processing it created. - -So we only have entries for those that need changing, usually to change priorities of -systemd packages and remove recommends and suggests for systemd related packages. - -* are indices a part of the repo or only needed by amprolla? -both. In debian, dak generates them and they are hand modified by the repo masters to -apply needed fixes. With amprolla, we only create them for applying our own changes as needed. -Technically they don't need to be in the repo, as they're not used by apt, but practically -it's good to have them there. - -hmmm, I think I've cracked my problem... -If I use the Sources delta to identify changed packages, I can use that to pick and apply -the changed Packages stanza's Contents and Translations. This would save lot's of -iterations, and I only need the delta Processing to be done on the Sources files. -Wow that would really speed things up - -The other benefit, is we can side load packages this way too and use it to replace dak -as well as either a standalone repo or directly into the merged repo. -And all without a hefty database. or the writeup - -your welcome. It has helped me probably as much as you. I think it's -turning into a full rewrite, but seems better design and possibly far easier to -write from scratch. -Anyway, it's nearly 3:30am here, so better get a couple hours sleep!
diff --git a/orchestrate.py b/orchestrate.py
t@@ -2,7 +2,7 @@
 # see LICENSE file for copyright and license details
 
 """
-Module used to orchestrace the entire amprolla merge
+Module used to orchestrate the entire amprolla merge
 """
 
 from os.path import join
t@@ -12,8 +12,6 @@ from lib.config import (arches, categories, suites, mergedir, mergesubdir,
                         pkgfiles, srcfiles, spooldir, repos)
 from lib.release import write_release
 
-# from pprint import pprint
-
 
 def do_merge():
     """
t@@ -33,7 +31,7 @@ def do_merge():
 
     am = __import__('amprolla_merge')
 
-    p = Pool(4)
+    p = Pool(4)  # Set it to the number of CPUs you want to use
     p.map(am.main, pkg)