- Create a VPS on horizon.wikimedia.org
- 24 core, 122GB RAM machine
- Debian 11
webservicesecurity group
- Configure a Cinder volume on horizon.wikimedia.org
- 5000 GB
- attach to the VPS
- Add a Web Proxy on Horizon (
wikiwho.wmcloud.orgor something else from thesettings_wmcloud.pyhosts entry)- Also do the same for
wikiwho-flower.wmcloud.org
- Also do the same for
- SSH into the VPS
- Prepare the Cinder volume.
sudo wmcs-prepare-cinder-volume- mount it to
/pickles sudo mkdir -p /pickles/{en,eu,es,de,tr}sudo chown -R wikiwho /pickles
- Clone the repo and make
wikiwhoown it:git clone https://github.com/wikimedia/wikiwho_api.git /home/wikiwho/wikiwho_apichown -R wikiwho:wikiwho /home/wikiwho/wikiwho_api
- Run the customization script:
sudo sh wikimedia_cloud_customization_script.sh
- Add the password for the Flower UI:
sudo htpasswd -c /etc/apache2/.htpasswd wikiwho
- Create the wikiwho postgres user and database:
sudo su postgrespsqlcreate user wikiwho with password 'wikiwho';create database wikiwho;grant all privileges on database wikiwho to wikiwho;exit;
exit
- Become the wikiwho user:
sudo su wikiwho cd /home/wikiwho/wikiwho_api- Set up a virtualenv:
python3 -m venv env. env/bin/activate
- Install the Python dependencies:
pip install -r requirements.txt -r requirements_local.txt -r requirements_test.txt - Create
wikiwho_api/settings.py(in the wikiwho_api subdirectory, not the top git directory), with an import fromsettings_wmcloudplus SECRET_KEY, WP_CONSUMER_TOKEN, WP_CONSUMER_SECRET, WP_ACCESS_TOKEN, WP_ACCESS_SECRET, and DATABASES.- Generate a secret key:
python manage.py generate_secret_key
- Generate a secret key:
python manage.py migratepython manage.py collectstatic --noinput -c- As a user with sudo, start the Gunicorn webserver:
sudo systemctl enable ww_gunicornsudo systemctl start ww_gunicornsudo systemctl status ww_gunicornto check if it's running- API and homepage should be working now.
- Start Celery:
sudo systemctl enable ww_celerysudo systemctl start ww_celerysudo systemctl status ww_celeryto check if it's running
- Import dumps (as user
wikiwho)sudo su wikiwhomkdir -p /pickles/{en,eu,es,de,tr}- Download the latest dumps for each of the languages to import, eg:
cd /pickles/dumps/en- New format (bz2):
wget -r -np -nd -c -A bz2 https://dumps.wikimedia.org/other/mediawiki_content_history/enwiki/{datestamp}/xml/bzip2/ - Legacy format (7z, deprecated):
wget -r -np -nd -c -A 7z https://dumps.wikimedia.your.org/enwiki/20211201/
- For each language, generate pickles from the XML dumps, eg:
cd ~/wikiwho_api. env/bin/activatenohup python manage.py generate_articles_from_wp_xmls -p '/pickles/dumps/en/' -t 30 -m 24 -lang en -c
- Start Flower and event_stream services
sudo systemctl enable ww_flower.servicesudo systemctl start ww_flower.servicesudo systemctl status ww_flower.serviceto check if it's runningsudo systemctl enable ww_events_stream.servicesudo systemctl start ww_events_stream.servicesudo systemctl status ww_events_stream.serviceto check if it's runningsudo systemctl start ww_events_stream_deletion.servicesudo systemctl status ww_events_stream_deletion.serviceto check if it's running
- Add cronjob to restart services daily (see T344936
for more information).
sudo su rootcrontab -e- Add the entry
0 0 * * * /home/wikiwho/wikiwho_api/cron/restart_services.sh
- Download the dumps into a volume (new languages most likely should go in the new
pickle_storage02, mounted to/pickles-02)mkdir /pickles-02/{lang}mkdir /pickles-02/dumps/{lang}cd /pickles-02/dumps/{lang}screen- Download dumps (either format works):
- New format (bz2, preferred):
wget -r -np -nd -c -A bz2 https://dumps.wikimedia.org/other/mediawiki_content_history/{lang}wiki/{datestamp}/xml/bzip2/ - Legacy format (7z, deprecated):
wget -r -np -nd -c -A 7z https://dumps.wikimedia.org/{lang}wiki/{datestamp}/- Use the latest complete dump. Newer versions may be available at https://dumps.wikimedia.your.org
- If you get an error or otherwise no files were downloaded, the dump may be incomplete. Try using an older dump.
- New format (bz2, preferred):
- The hit Ctrl+A and the
dkey to detach from screen and keep the downloading of the dumps running in the background. - When you thnk it may be finished, verify by reentering the screen session with
screen -r, then typeexitif it's finished or use Ctrl+A anddto detch again.
- Create a pull request to add the new language to the app, except for EventStreams (example PR).
- The migrations can be created with
python manage.py makemigrations rest_framework_tracking api --empty, and then fill in the code accordingly, using previous migrations as a guide. These migrations may eventually not be necessary, pending the outcome of T335322.
- The migrations can be created with
- Start the import process on the VPS instance:
sudo su wikiwhocd ~/wikiwho_apigit pull origin main. env/bin/activatepython manage.py migratenohup python manage.py generate_articles_from_wp_xmls -p '/pickles/dumps/{lang}/' -t 30 -m 24 -lang {lang} -cthen Ctrl+Z and then enterbgto background the process.- After typing
top, you should see ~24pythonprocesses running. You can monitor progress withls -al /pickles-02/{lang}/ | wc -land that number should eventually roughly equal the total number of articles on the wiki. Note this command will run very slow after there are hundreds of thousands or millions of pickle files.
- Once complete, create a PR to add the wiki to EventStreams (example PR).
- Deploy and restart services (using your account and not
wikiwho):- Pull in latest changes
- Restart the Flower and EventStreams services with
sudo systemctl restart ww_flower.serviceandsudo systemctl restart ww_events_stream.service - Restart Celery with
sudo systemctl restart ww_celery.service
- Update clients accordingly (XTools, Who Wrote That?, Programs & Events Dashboard, etc.)
Some various tips to help troubleshoot issues in production:
- Check https://wikiwho-flower.wmcloud.org to monitor Celery tasks.
- Use
sudo journalctl -u ww_events_streamto view the logs for theww_events_streamservice, or replace with another service name such asnginx. - See also the Celery logs at
/var/log/celery/*.log.