MilliwaysStack: Difference between revisions
| Line 106: | Line 106: | ||
** <s>2* SFF-8087 -> SFF-8087 Mini SAS Cable</s> | ** <s>2* SFF-8087 -> SFF-8087 Mini SAS Cable</s> | ||
** Drives | ** Drives | ||
*** <s>SSD for OS</s> | *** <s>500GB SSD for OS</s> | ||
*** <s>Bracket and SATA Cable Adapter for SSD</s> | |||
*** 12T ? | *** 12T ? | ||
* HP1 - Control | * HP1 - Control | ||
Revision as of 12:17, 20 January 2026
We want to run an OpenStack experiment
The grander idea
We want to try out an installation of OpenStack to give people around milliways experience with running (on).
From an unnamed source we got 10 HPE servers. We will use 8 of them to run OpenStack on it. Storage is on a seperate machine.
MVP
The MVP would be:
- Kubernetes / docker
- object storage
- file systems
- Networking
- Virtual machines
- Firewalling
- Databases - mariaDB / PostgreSQL
- Someone something redis I guess
- container registry
e-MVP
The extended MVP would be:
- functional Monitoring & alerting
- autoscaling
- integration into milliways identity & access management authentik
- logging & alerting
the software stack explained
OpenStack is a cloud framework stack that offeres AWS / Azure / GC alike services.
Most documentation is availible for Ubuntu & Red Hat. On the longer term an installation under NixOS might be feasable.
Asset List
Rack
- 47U
- 950mm external depth
- 915mm internal depth
Switches
- 2 x Dell PowerConnect 7048R-RA
- 1 x Cisco 3560e
- 1 Dell PowerEdge R710 server as storage
- 2 x X5570 2,93GHz
- 192GB RAM
- 6 x 3,5" bays
- 6 x 3,5" drive sleds/brackets
- Drives
- 1 x Samsung 850 EVO 500GB
- for OS
- We have more drives than bays, but not enough drives to make a nice or ideal configuration. As such, the Dell storage situation is likely temporary
until RMA'd seagate drives returnand we can figure out if we add more 12T or 10T?2* Seagate Exos X16 12TB- Passes SMART shorttest
- Fails SMART longtest
- RMA'd to Seagate
1* Seagate Exos X16 10TB- Passes SMART shorttest
- Fails SMART longtest
RMA'd to Seagate
- 2 x Seagate Exos X18 12TB
- 1 x Seagate Exos X18 10TB
- 4 x WD Red 4TB
- 4 x WD Green 3TB
- 1 x Samsung 850 EVO 500GB
- no rails
- 2 x HPE proliant DL380 Gen 8
- 2 x E5-2620 v3 2,4GHz
- 384GB ram
- PCI Riser to 4* NVMe adapter
- 1TB Crucial NVMe
- iLO4
- It seems it accepts 35DPH-SVSXJ-HGBJN-C7N5R-2SS4W as activation key for iLO Advanced license?
- without hard drives but has 2,5" bays
- no drive sleds/brackets available, only blanks
- Slide rails
- 8 x HPE proliant DL380 Gen 8
- 2 x E5-2620 v3 2,4GHz
- 384GB ram
- iLO4
- without hard drives but has 2,5" bays
- no drive sleds/brackets available, only blanks
- 7 x slide rails
Shopping List
It's ofc. sexy as all hell to buy memory, AI cards, flash storage and allsorts, but literally none of that will ever work if we don't have our Generic basics in order. While we prefer big donations go to big ticket items, many small ticket items unexpectedly add up in the long run. Please do not forget the generic basics!
- Generic Basics
- PDU
Temporary 1U unmanaged PDU with 16A/230V C19 input and 1* C19 + 8* Type F outlet.- Perfect: Managed Rack mountable PDU with CEE Red 16A/20A 400v input to C13/C14 + C19/C20 outlets.
- Alternatively; a "normal" Serverrack PDU (still strong prefer managed) + 16A/20A 400v -> 16A 230V transform
- Network Cables
- [Color]
- [Type],[Amount],[Length]
- [Color]
- Power Cables
- [Type],[Amount],[Length]
- Screws, Nuts, Bolts
Assorted M2,M2.5,M3 Screws
- PCI Risers
Single NVMe adapters- Multi NVMe adapters
- KVM
- PiKVM?
- PDU
- Dell - Storage
2* Drive sledsNew RAID Card that supports passthrough2* SFF-8087 -> SFF-8087 Mini SAS Cable- Drives
500GB SSD for OSBracket and SATA Cable Adapter for SSD- 12T ?
- HP1 - Control
1* PCI riser to 4*NVMe adapter1* 1TB NVMe
- HP2 - Compute
1* PCI riser to 4*NVMe adapter1* 1TB NVMe
- Flash Storage
1 x 2TB Samsung 860 EVO
Documentation
nb. this is quick 'n' dirty as I go along. In the short-term future I'd much rather replace this adhoc documentation with something like NetBox.
Network
- Supernet 10.42.0.0/16
- Vlan 42
- Interconnect
- 10.42.0.0/30
- Gateway 10.42.0.1
- Milliways Core 10.42.0.2
- Vlan 5
- Mgmt \ OOB
- 10.42.1.0/24
- Milliways Core 10.42.1.1
- Dell iDRAC 10.42.1.5
- Dell RAID Controller 10.42.1.6
- HP 1 iLO 10.42.1.7
- HP 2 iLO 10.42.1.8
- Vlan 10
- Prod
- 10.42.10.0/24
- Milliways Core 10.42.10.1
- Dell 10.42.10.2
- HP 1 10.42.10.3
- HP 2 10.42.10.5
- Vlan 42
Cable Mgmt
As there are some early ambitions to physically take this environment to events, perhaps we should seriously think about making our lives easier by already thinking about colorcoding connectivity. While this will help us connecting everything again at $event when we're sleepdeprived\drunk\explaining to newbies, this has the added effect of making it all look slightly more cooler than just a spaghetti of all boring white cables or worse, a spaghetti of whatever the fuck we have lying around.
This is all just made-up without too much thought. This is specifically intended to start a discussion so we can work toward an agreement, it is not intended to be a unilateral decision. Example; You'll notice 0 thought was put into fiber or not ;)
- RED
- Mgmt \ OOB
- iDRACs, iLOs, RAID Cards, etc
- Mgmt \ OOB
- GREEN
- Storage Prod
- At least the Dell, maybe HPs if we get into flash storage
- Storage Prod
- BLUE
- Compute Prod
- Likely overwhelmingly the HPs
- Compute Prod
- YELLOW
- Interconnect
- Connectivity to $outside, between switches, whatever
- Interconnect
Naming Convention
We need names! Can't keep calling these "Dell", "HP1", "HP2" etc. Calling them by their S/Ns is also super boring and cumbersome; "Oh yea, we need to setup 5V6S064" We could even opt for dual names. Internally, when logged in to $shell, the names could be functional "milliways-control-node-1" so it's clear what you're doing, but externally, the Asset Tag could be a Hitchhiker's Guide to the Galaxy character or a DiscWorld town or something. That way, if we do ever show this off at events, we can do cool shit with light up tags, make stuff funny and recognizable and cool to talk about - it also makes it way more relatable to market for when asking for donations; "Ya, we're looking for extra storage for Überwald" sounds much better than "Ya we're looking for extra storage for 5V6S064 or milliways-control-node-1" Naturally, once we get NetBox going, we can map the Asset names to the actual server name and potentially it's serial so we don't get confused internally (if we want to use serials, there's somethign to be said for not using serials here)
- Functional
- milliways-control-node-1
- milliways-control-node-2
- control-node-1
- compute-node-1
- flash-storage-1
- Marketing
- HGttG characters
- Arthur
- Ford
- Zaphod
- Discworld locations
- Ankh-Morpork
- Überwald
- Lancre
- HGttG characters
OpenStack
We're using 2025.1 (epoxy) as 2025.2 (flamingo) has an undocumented breaking change making installation of keystone impossible. We have registered a bug with the documentation on launchpad for this.
Following installation guide recommendation, passwords are created with openssl rand -hex 10 and saved in a password store.
Controller
- Identity service
- Broken in 2025.2
- This commit removes the WSGI scripts, ``keystone-wsgi-admin`` and ``keystone-wsgi-public``.
- Both scripts are still called by the openstack command. This means running any openstack command to create a domain, projects, users, and roles fails with the error
Failed to discover available identity versions when contacting http://controller:5000/v3. Attempting to parse version from URL.
- Evidence:
tail /var/log/apache2/keystone.logTarget WSGI script not found or unable to stat: /usr/bin/keystone-wsgi-public
- Workaround, use 2025.1 instead
- Completed 2025-01-18
- Broken in 2025.2
- Image service
- Completed 2025-01-19
- Placement service
- management portions of Compute
- management portion of Networking
- various Networking agents
- Dashboard