Jump to content

MilliwaysStack

From Milliways
Revision as of 16:10, 21 January 2026 by Obsidian (talk | contribs) (Controller)

We want to run an OpenStack experiment

The grander idea

We want to try out an installation of OpenStack to give people around milliways experience with running (on).

From an unnamed source we got 10 HPE servers. We will use 8 of them to run OpenStack on it. Storage is on a seperate machine.

MVP

The MVP would be:

  • Kubernetes / docker
  • object storage
  • file systems
  • Networking
  • Virtual machines
  • Firewalling
  • Databases - mariaDB / PostgreSQL
  • Someone something redis I guess
  • container registry

e-MVP

The extended MVP would be:

  • functional Monitoring & alerting
  • autoscaling
  • integration into milliways identity & access management authentik
  • logging & alerting

the software stack explained

OpenStack is a cloud framework stack that offeres AWS / Azure / GC alike services.

Most documentation is availible for Ubuntu & Red Hat. On the longer term an installation under NixOS might be feasable.

Asset List

Rack

  • 47U
  • 950mm external depth
    • 915mm internal depth

Consumables & Small Materials

  • 1 x Samsung 2.5" 860 EVO 2TB
  • Assorted M2 - M3 screws
  • Assorted mismatched bundle of M5 and M6 cagenuts and bolts
  • SFPs

Switches

  • 2 x Dell PowerConnect 7048R-RA
  • 1 x Cisco 3560e
  • 1 Dell PowerEdge R710 server as storage
    • 2 x X5570 2,93GHz
    • 192GB RAM
    • 6 x 3,5" bays
      • 6 x hotswap 3,5" drive sleds/brackets
    • Drives
      • 1 x Samsung 2.5" 850 EVO 500GB
      • We have more drives than bays, but not enough drives to make a nice or ideal configuration. As such, the Dell storage situation is likely temporary until we can figure out if we add more 12T or 10T or keep as-is.
        • 2 x Seagate Exos X18 12TB
        • 1 x Seagate Exos X18 10TB
        • 4 x WD Red 4TB
        • 4 x WD Green 3TB
    • no rails
  • 2 x HPE proliant DL380 Gen 8
    • 2 x E5-2620 v3 2,4GHz
    • 384GB ram
    • PCI Riser to 4* NVMe adapter
      • 1TB Crucial NVMe
    • iLO4
      • It seems it accepts 35DPH-SVSXJ-HGBJN-C7N5R-2SS4W as activation key for iLO Advanced license?
    • without hard drives but has 2,5" bays
      • no drive sleds/brackets available, only blanks
    • Slide rails
  • 8 x HPE proliant DL380 Gen 8
    • 2 x E5-2620 v3 2,4GHz
    • 384GB ram
    • iLO4
    • without hard drives but has 2,5" bays
      • no drive sleds/brackets available, only blanks
    • 7 x slide rails

Shopping List

It's ofc. sexy as all hell to buy memory, AI cards, flash storage and allsorts, but literally none of that will ever work if we don't have our Generic basics in order. While we prefer big donations go to big ticket items, many small ticket items unexpectedly add up in the long run. Please do not forget the generic basics!
  • Generic Basics
    • PDU
      • Temporary 1U unmanaged PDU with 16A/230V C19 input and 1* C19 + 8* Type F outlet.
      • Perfect: Managed Rack mountable PDU with CEE Red 16A/20A 400v input to C13/C14 + C19/C20 outlets.
      • Alternatively; a "normal" Serverrack PDU (still strong prefer managed) + 16A/20A 400v -> 16A 230V transform
    • Network Cables
      • Some actual properly matching cables would be great
      • [Color]
        • [Type],[Amount],[Length]
    • Power Cables
      • Some actual properly matching cables would be great
      • [Type],[Amount],[Length]
    • Screws, Nuts, Bolts
      • Assorted M2,M2.5,M3 Screws
      • Some actual properly matching cage nuts\bolts would be great
    • PCI Risers
      • Single NVMe adapters
      • Multi NVMe adapters
    • KVM
      • PiKVM?
  • Dell - Storage
    • 2* Drive sleds
    • New RAID Card that supports passthrough\JBOD
    • 2* SFF-8087 -> SFF-8087 Mini SAS Cable
    • Drives
      • 500GB SSD for OS
      • Bracket and SATA Cable Adapter for SSD
      • Technically not shopping, but for historical tracking;
        • Old Exos X16 2 x 12T and 1 x 10T were RMA'd and replaced with X18's
      • 12T ?
  • HP1 - Control
    • 1* PCI riser to 4*NVMe adapter
    • 1* 1TB NVMe
  • HP2 - Compute
    • 1* PCI riser to 4*NVMe adapter
    • 1* 1TB NVMe
  • Flash Storage
    • We'll need Drive Trays for the HPs if we wanna add 2.5" SSDs
    • Control and Compute servers each have 3 open m.2 NVMe slots
    • 1 x 2TB Samsung 860 EVO

Documentation

nb. this is quick 'n' dirty as I go along.
In the short-term future I'd much rather replace this adhoc documentation with something like NetBox.

Network

  • Supernet 10.42.0.0/16
    • Vlan 42
      • Interconnect
      • 10.42.0.0/30
        • Gateway 10.42.0.1
        • Milliways Core 10.42.0.2
    • Vlan 5
      • Mgmt \ OOB
      • 10.42.1.0/24
        • Milliways Core 10.42.1.1
        • Dell iDRAC 10.42.1.5
        • Dell RAID Controller 10.42.1.6
        • HP 1 iLO 10.42.1.7
        • HP 2 iLO 10.42.1.8
    • Vlan 10
      • Prod
      • 10.42.10.0/24
        • Milliways Core 10.42.10.1
        • Dell 10.42.10.2
        • HP 1 10.42.10.3
        • HP 2 10.42.10.5
    • Vlan 15
      • Provider Network
        • This is an OpenStack thing for the secondary Control and Compute node interfaces.
        • Currently no IP address assigned.
        • May change in future if documentation mandates it.

Cable Mgmt

As there are some early ambitions to physically take this environment to events, perhaps we should seriously think about making our lives easier by already thinking about colorcoding connectivity. While this will help us connecting everything again at $event when we're sleepdeprived\drunk\explaining to newbies, this has the added effect of making it all look slightly more cooler than just a spaghetti of all boring white cables or worse, a spaghetti of whatever the fuck we have lying around.

This is all just made-up without too much thought. This is specifically intended to start a discussion so we can work toward an agreement, it is not intended to be a unilateral decision. Example; You'll notice 0 thought was put into fiber or not ;)

  • RED
    • Mgmt \ OOB
      • iDRACs, iLOs, RAID Cards, etc
  • GREEN
    • Storage Prod
      • At least the Dell, maybe HPs if we get into flash storage
  • BLUE
    • Compute Prod
      • Likely overwhelmingly the HPs
  • YELLOW
    • Interconnect
      • Connectivity to $outside, between switches, whatever

Naming Convention

We need names!
Can't keep calling these "Dell", "HP1", "HP2" etc.
Calling them by their S/Ns is also super boring and cumbersome; "Oh yea, we need to setup 5V6S064"
We could even opt for dual names. Internally, when logged in to $shell, the names could be functional "milliways-control-node-1" so it's clear what you're doing, but externally, the Asset Tag could be a Hitchhiker's Guide to the Galaxy character or a DiscWorld town or something. That way, if we do ever show this off at events, we can do cool shit with light up tags, make stuff funny and recognizable and cool to talk about - it also makes it way more relatable to market for when asking for donations; "Ya, we're looking for extra storage for Überwald" sounds much better than "Ya we're looking for extra storage for 5V6S064 or milliways-control-node-1"
Naturally, once we get NetBox going, we can map the Asset names to the actual server name and potentially it's serial so we don't get confused internally (if we want to use serials, there's somethign to be said for not using serials here)
  • Functional
    • milliways-control-node-1
    • milliways-control-node-2
    • control-node-1
    • compute-node-1
    • flash-storage-1
  • Marketing
    • HGttG characters
      • Arthur
      • Ford
      • Zaphod
    • Discworld locations
      • Ankh-Morpork
      • Überwald
      • Lancre

OpenStack

We're using 2025.1 (epoxy) as 2025.2 (flamingo) has an undocumented breaking change making installation of keystone impossible. We have registered a bug with the documentation on launchpad for this.
Following installation guide recommendation, passwords are created with openssl rand -hex 10 and saved in a password store.

Controller

  • Identity service
    • Broken in 2025.2
      • This commit removes the WSGI scripts, ``keystone-wsgi-admin`` and ``keystone-wsgi-public``.
      • Both scripts are still called by the openstack command. This means running any openstack command to create a domain, projects, users, and roles fails with the error
        • Failed to discover available identity versions when contacting http://controller:5000/v3. Attempting to parse version from URL.
      • Evidence:
        • tail /var/log/apache2/keystone.log
          • Target WSGI script not found or unable to stat: /usr/bin/keystone-wsgi-public
    • Workaround, use 2025.1 instead
    • Completed 2025-01-18
  • Image service
    • Bad Documentation
      • The guide has you create 3 API endpoints for the service.
        • You need to configure access to keystone with one of them, but you are not told which one. Only public will work.
      • Configuring glance-api.conf is done haphazardly in the guide
        • config options are organized alphabetically, the guide is not.
    • Completed 2025-01-19
  • Placement service
    • Bad Documentation
      • If you followed the guide, your user account does not have the rights to read /etc/placement/placement.conf
      • Running placement-status upgrade check as root proves the service works.
      • Undocumented requirement fulfilled; usermod -aG placement
    • Completed 2025-01-20
  • management portions of Compute
    • Bad Documentation
      • Configuring nova.conf is done haphazardly in the guide
        • config options are organized alphabetically, the guide is not.
      • The guide attempts to make you configure options for the networking service, which you have not installed yet, because the guide makes you install this service first
      • Due to a packaging bug, remove the log_dir option from the [DEFAULT] section.
        • ???? THEN FIX THE PACKAGE?!?!?!!!!
      • The [glance] option you are instructed to use is deprecated
    • Completed 2025-01-20
  • management portion of Networking
    • Bad Documentation
      • Configuring neutron.conf is done haphazardly in the guide
        • config options are organized alphabetically, the guide is not.
    • More Bad Documentation
      • The guide refers to configuring the Open vSwitch agent and offers more information which directly contradicts the guide.
        • The guide says to edit neutron.conf with service_plugins = router
        • The Open vSwitch agent example configuration for controllers says: "Disable service plug-ins because provider networks do not require any."
      • Configuring openvswitch_agent.ini is done haphazardly in the guide
        • config options are organized alphabetically, the guide is not.
        • The guide attempts to make you configure the the name of the bridge connected to the underlying provider physical network but you have not yet created this bridge when the guide asks you for the name.
    • Completed 2025-01-21
  • various Networking agents
  • Dashboard
    • Extremely weird behavior, Dashboard will only load if Debug is set to True and compression is turned on.
    • Completed 2025-01-21

communications