MilliwaysStack: Difference between revisions
| Line 117: | Line 117: | ||
** <s>1* 1TB NVMe</s> | ** <s>1* 1TB NVMe</s> | ||
* Flash Storage | * Flash Storage | ||
** We'll need | |||
** <s>1 x 2TB Samsung 860 EVO</s> | ** <s>1 x 2TB Samsung 860 EVO</s> | ||
Revision as of 10:48, 21 January 2026
We want to run an OpenStack experiment
The grander idea
We want to try out an installation of OpenStack to give people around milliways experience with running (on).
From an unnamed source we got 10 HPE servers. We will use 8 of them to run OpenStack on it. Storage is on a seperate machine.
MVP
The MVP would be:
- Kubernetes / docker
- object storage
- file systems
- Networking
- Virtual machines
- Firewalling
- Databases - mariaDB / PostgreSQL
- Someone something redis I guess
- container registry
e-MVP
The extended MVP would be:
- functional Monitoring & alerting
- autoscaling
- integration into milliways identity & access management authentik
- logging & alerting
the software stack explained
OpenStack is a cloud framework stack that offeres AWS / Azure / GC alike services.
Most documentation is availible for Ubuntu & Red Hat. On the longer term an installation under NixOS might be feasable.
Asset List
Rack
- 47U
- 950mm external depth
- 915mm internal depth
Consumables & Small Materials
- 1 x Samsung 860 EVO 2TB
- Assorted M2 - M3 screws
- Assorted mismatched bundle of M5 and M6 cagenuts and bolts
- SFPs
Switches
- 2 x Dell PowerConnect 7048R-RA
- 1 x Cisco 3560e
- 1 Dell PowerEdge R710 server as storage
- 2 x X5570 2,93GHz
- 192GB RAM
- 6 x 3,5" bays
- 6 x hotswap 3,5" drive sleds/brackets
- Drives
- 1 x Samsung 850 EVO 500GB
- for OS
- Hidden in aftermarket "Optical Drive" adapter.
- We have more drives than bays, but not enough drives to make a nice or ideal configuration. As such, the Dell storage situation is likely temporary until we can figure out if we add more 12T or 10T or keep as-is.
- 2 x Seagate Exos X18 12TB
- 1 x Seagate Exos X18 10TB
- 4 x WD Red 4TB
- 4 x WD Green 3TB
- 1 x Samsung 850 EVO 500GB
- no rails
- 2 x HPE proliant DL380 Gen 8
- 2 x E5-2620 v3 2,4GHz
- 384GB ram
- PCI Riser to 4* NVMe adapter
- 1TB Crucial NVMe
- iLO4
- It seems it accepts 35DPH-SVSXJ-HGBJN-C7N5R-2SS4W as activation key for iLO Advanced license?
- without hard drives but has 2,5" bays
- no drive sleds/brackets available, only blanks
- Slide rails
- 8 x HPE proliant DL380 Gen 8
- 2 x E5-2620 v3 2,4GHz
- 384GB ram
- iLO4
- without hard drives but has 2,5" bays
- no drive sleds/brackets available, only blanks
- 7 x slide rails
Shopping List
It's ofc. sexy as all hell to buy memory, AI cards, flash storage and allsorts, but literally none of that will ever work if we don't have our Generic basics in order. While we prefer big donations go to big ticket items, many small ticket items unexpectedly add up in the long run. Please do not forget the generic basics!
- Generic Basics
- PDU
Temporary 1U unmanaged PDU with 16A/230V C19 input and 1* C19 + 8* Type F outlet.- Perfect: Managed Rack mountable PDU with CEE Red 16A/20A 400v input to C13/C14 + C19/C20 outlets.
- Alternatively; a "normal" Serverrack PDU (still strong prefer managed) + 16A/20A 400v -> 16A 230V transform
- Network Cables
- Some actual properly matching cables would be great
- [Color]
- [Type],[Amount],[Length]
- Power Cables
- Some actual properly matching cables would be great
- [Type],[Amount],[Length]
- Screws, Nuts, Bolts
Assorted M2,M2.5,M3 Screws- Some actual properly matching cage nuts\bolts would be great
- PCI Risers
Single NVMe adapters- Multi NVMe adapters
- KVM
- PiKVM?
- PDU
- Dell - Storage
2* Drive sledsNew RAID Card that supports passthrough\JBOD2* SFF-8087 -> SFF-8087 Mini SAS Cable- Drives
500GB SSD for OSBracket and SATA Cable Adapter for SSD- 12T ?
- HP1 - Control
1* PCI riser to 4*NVMe adapter1* 1TB NVMe
- HP2 - Compute
1* PCI riser to 4*NVMe adapter1* 1TB NVMe
- Flash Storage
- We'll need
1 x 2TB Samsung 860 EVO
Documentation
nb. this is quick 'n' dirty as I go along. In the short-term future I'd much rather replace this adhoc documentation with something like NetBox.
Network
- Supernet 10.42.0.0/16
- Vlan 42
- Interconnect
- 10.42.0.0/30
- Gateway 10.42.0.1
- Milliways Core 10.42.0.2
- Vlan 5
- Mgmt \ OOB
- 10.42.1.0/24
- Milliways Core 10.42.1.1
- Dell iDRAC 10.42.1.5
- Dell RAID Controller 10.42.1.6
- HP 1 iLO 10.42.1.7
- HP 2 iLO 10.42.1.8
- Vlan 10
- Prod
- 10.42.10.0/24
- Milliways Core 10.42.10.1
- Dell 10.42.10.2
- HP 1 10.42.10.3
- HP 2 10.42.10.5
- Vlan 42
Cable Mgmt
As there are some early ambitions to physically take this environment to events, perhaps we should seriously think about making our lives easier by already thinking about colorcoding connectivity. While this will help us connecting everything again at $event when we're sleepdeprived\drunk\explaining to newbies, this has the added effect of making it all look slightly more cooler than just a spaghetti of all boring white cables or worse, a spaghetti of whatever the fuck we have lying around.
This is all just made-up without too much thought. This is specifically intended to start a discussion so we can work toward an agreement, it is not intended to be a unilateral decision. Example; You'll notice 0 thought was put into fiber or not ;)
- RED
- Mgmt \ OOB
- iDRACs, iLOs, RAID Cards, etc
- Mgmt \ OOB
- GREEN
- Storage Prod
- At least the Dell, maybe HPs if we get into flash storage
- Storage Prod
- BLUE
- Compute Prod
- Likely overwhelmingly the HPs
- Compute Prod
- YELLOW
- Interconnect
- Connectivity to $outside, between switches, whatever
- Interconnect
Naming Convention
We need names! Can't keep calling these "Dell", "HP1", "HP2" etc. Calling them by their S/Ns is also super boring and cumbersome; "Oh yea, we need to setup 5V6S064" We could even opt for dual names. Internally, when logged in to $shell, the names could be functional "milliways-control-node-1" so it's clear what you're doing, but externally, the Asset Tag could be a Hitchhiker's Guide to the Galaxy character or a DiscWorld town or something. That way, if we do ever show this off at events, we can do cool shit with light up tags, make stuff funny and recognizable and cool to talk about - it also makes it way more relatable to market for when asking for donations; "Ya, we're looking for extra storage for Überwald" sounds much better than "Ya we're looking for extra storage for 5V6S064 or milliways-control-node-1" Naturally, once we get NetBox going, we can map the Asset names to the actual server name and potentially it's serial so we don't get confused internally (if we want to use serials, there's somethign to be said for not using serials here)
- Functional
- milliways-control-node-1
- milliways-control-node-2
- control-node-1
- compute-node-1
- flash-storage-1
- Marketing
- HGttG characters
- Arthur
- Ford
- Zaphod
- Discworld locations
- Ankh-Morpork
- Überwald
- Lancre
- HGttG characters
OpenStack
We're using 2025.1 (epoxy) as 2025.2 (flamingo) has an undocumented breaking change making installation of keystone impossible. We have registered a bug with the documentation on launchpad for this.
Following installation guide recommendation, passwords are created with openssl rand -hex 10 and saved in a password store.
Controller
- Identity service
- Broken in 2025.2
- This commit removes the WSGI scripts, ``keystone-wsgi-admin`` and ``keystone-wsgi-public``.
- Both scripts are still called by the openstack command. This means running any openstack command to create a domain, projects, users, and roles fails with the error
Failed to discover available identity versions when contacting http://controller:5000/v3. Attempting to parse version from URL.
- Evidence:
tail /var/log/apache2/keystone.logTarget WSGI script not found or unable to stat: /usr/bin/keystone-wsgi-public
- Workaround, use 2025.1 instead
- Completed 2025-01-18
- Broken in 2025.2
- Image service
- Bad Documentation
- The guide has you create 3 API endpoints for the service.
- You need to configure access to keystone with one of them, but you are not told which one. Only
publicwill work.
- You need to configure access to keystone with one of them, but you are not told which one. Only
- Configuring glance-api.conf is done haphazardly in the guide
- config options are organized alphabetically, the guide is not.
- The guide has you create 3 API endpoints for the service.
- Completed 2025-01-19
- Bad Documentation
- Placement service
- Bad Documentation
- If you followed the guide, your user account does not have the rights to read
/etc/placement/placement.conf - Running
placement-status upgrade checkas root proves the service works. - Undocumented requirement fulfilled;
usermod -aG placement
- If you followed the guide, your user account does not have the rights to read
- Completed 2025-01-20
- Bad Documentation
- management portions of Compute
- Bad Documentation
- Configuring nova.conf is done haphazardly in the guide
- config options are organized alphabetically, the guide is not.
- The guide attempts to make you configure options for the networking service, which you have not installed yet, because the guide makes you install this service first
Due to a packaging bug, remove the log_dir option from the [DEFAULT] section.- ???? THEN FIX THE PACKAGE?!?!?!!!!
- The
[glance]option you are instructed to use is deprecated
- Configuring nova.conf is done haphazardly in the guide
- Completed 2025-01-20
- Bad Documentation
- management portion of Networking
- Bad Documentation
- Configuring neutron.conf is done haphazardly in the guide
- config options are organized alphabetically, the guide is not.
- Configuring neutron.conf is done haphazardly in the guide
- Bad Documentation
- various Networking agents
- Dashboard