Monthly Archives: May 2020

Metadata on App Engine

There are various bits of metadata about your project and the current request available to your Web app when running on Google App Engine. Let’s see what they are.

These notes show how to access things using Python 3, but they apply to any App Engine standard runtime (except for the older Python 2.7, Go and Java runtimes).

Additional fields in the WSGI request environ

A request to your app results in your code being invoked with a WSGI environ object that captures the request as a series of key / value pairs describing the requested URL, the headers sent by the client, and any data sent by the client.

As well as standard variables such as HTTP_HOST and QUERY_STRING, apps on App Engine receive additional headers that are inserted by the App Engine service itself (not sent by the client). When present, you can rely on these headers being set by App Engine itself, they cannot be set by a malicious client trying to trick your app.

These extra header names all start with HTTP_X_APPENGINE. Here’s a table of the names and example values,

Key Example value
HTTP_X_APPENGINE_CITY london
HTTP_X_APPENGINE_CITYLATLONG 51.507351,-0.127758
HTTP_X_APPENGINE_COUNTRY GB
HTTP_X_APPENGINE_DEFAULT_VERSION_HOSTNAME app-engine-example.appspot.com
HTTP_X_APPENGINE_HTTPS on
HTTP_X_APPENGINE_REGION eng
HTTP_X_APPENGINE_REQUEST_LOG_ID 5ec7d80400ff0736739830b90c0001737e64627578746f6e2d706573740001696e626a756e64000820
HTTP_X_APPENGINE_USER_IP 33.53.215.240

Those location-related headers are cool! In my experience they tend to be fairly accurate, but there are times Google is unable to determine a location. Note that HTTP_X_APPENGINE_REGION describes the requesting user’s region, not the region/zone where your app is hosted.

The HTTP_X_APPENGINE_DEFAULT_VERSION_HOSTNAME header is interesting because in February 2020 App Engine moved to having a short region code in *.appspot.com host names. I wonder if projects created prior to the introduction of the new scheme all have the old-style host names in this header.

How come the headers from the request are prefixed with HTTP_? Because WSGI extends CGI.

Environment variables

Your code has access to a regular Unix-style environment. App Engine uses this to share details of the runtime. Here’s a table with some of those keys and example values. See the documentation for environment variables for a complete list.

Key Example value
GAE_APPLICATION s~app-engine-example
GAE_DEPLOYMENT_ID 426619129872753442
GAE_ENV standard
GAE_INSTANCE 0c61b117cf53359b13df64f50b43903717d4d1d624f37d602146c9642e9ab832ea12a3a
GAE_MEMORY_MB 256
GAE_RUNTIME python37
GAE_SERVICE default
GAE_VERSION 20200221t144119
GOOGLE_CLOUD_PROJECT app-engine-example
PYTHONDONTWRITEBYTECODE 1

Interesting things:

  • GAE_APPLICATION is your project name, but prefixed with a letter and tilde. That prefix identifies the App Engine region your app belongs to, which you must choose the first time you create the App Engine service in your project. You can’t change the region later. I swear there’s documentation listing these 1 letter prefixes, but I can’t find it now.
  • GAE_SERVICE and GAE_VERSION identify the version of your app that is running. Well useful if you think everything must be divided into micro-services (it should not) or if you deploy multiple versions to test things before setting the default version (you should).
  • PYTHONDONTWRITEBYTECODE is a Python-specific thing that prevents the creation of *.pyc files and __pycache__ directories. In theory your app would start a little faster on the first request if this option was turned off, but in practise it doesn’t matter, and this makes life easier for how App Engine runs your code. Given that, I don’t get why the Google Cloud blog highlights the parallel filesystem cache feature in their Python 3.8 beta announcement. Maybe it is enabled on that runtime, I haven’t tested it myself.

For the older Python 2.7 standard runtime, the OS environment includes all the request things as well. Again, because CGI.

The metadata service

Google’s Compute Engine metadata service is there on App Engine too (but not for the older Python 2.7 standard runtime). Except App Engine doesn’t get all the same things, and it is read-only.

The metadata service runs as an HTTP server. Your code makes requests to http://metadata.google.internal/computeMetadata/v1/ and descendant paths. On App Engine the metadata service exposes information about service accounts, the project’s zone, and the project’s ID.

The service account path is particularly useful, covering some of what used to be available with the google.appengine.api.app_identity APIs. If your code uses the google-auth package, you can get credentials with a short-lived access token for the default service account via google.auth.default() which in turn gets the token from the metadata service.

But if you want to change the OAuth scopes of the access token, you can go get it yourself. This Python snippet gets a token with scopes for the Google Spreadsheets API:

import requests

url = ‘http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token
headers = {‘Metadata-Flavor’: ‘Google’}
scopes = [‘https://www.googleapis.com/auth/spreadsheets‘]
params = {‘scopes’: ‘,’.join(scopes)}
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
token = response.text

These are the paths currently exposed by the App Engine metadata service, relative to http://metadata.google.internal/computeMetadata/v1/, with example values:

Path Example value
instance/service-accounts/default/aliases default
instance/service-accounts/default/email app-engine-example@appspot.gserviceaccount.com
instance/service-accounts/default/identity?audience=foo JWT token
instance/service-accounts/default/scopes https://www.googleapis.com/auth/appengine.apis

https://www.googleapis.com/auth/cloud-platform
https://www.googleapis.com/auth/cloud_debugger
https://www.googleapis.com/auth/devstorage.full_control
https://www.googleapis.com/auth/logging.write
https://www.googleapis.com/auth/monitoring.write
https://www.googleapis.com/auth/trace.append
https://www.googleapis.com/auth/userinfo.email
|
instance/service-accounts/default/token | {
"access_token": "token",
"expires_in": 1799,
"token_type": "Bearer"
} |
instance/zone | projects/123456789012/zones/us16 |
project/numeric-project-id | 123456789012 |
project/project-id | app-engine-example |

If your project has multiple service accounts, there will be multiple entries under the instance/service-accounts path.

You can get all this (except for the access and identity tokens) with 1 request:

import requests

url = ‘http://metadata.google.internal/computeMetadata/v1/
headers = {‘Metadata-Flavor’: ‘Google’}
params = {‘recursive’: ‘true’}
response = requests.get(url, headers=headers, params=params)
response.raise_for_status()
data = response.json()

It annoys me that the zone is provided here, but not the region, and that there is no documented way to derive the longer region identifier from a short zone identifier (as far as I am aware). But looks like that will change soon!

Conclusion

Getting information about the App Engine runtime environment turns out to be incredibly useful for your app because you can remove hard-coded assumptions from your code. The metadata service is particularly interesting because it allows more flexible response types (as opposed to strings in Unix environment variables) and a clear path for Google to extend it without fear of breaking your crazy code.