The goals for this week are to
- Call a web service from the command line
- Be able to specify HTTP headers with
curl
requests - Be able to manipulate JSON files
This week we will look at web services.
Start by looking at the human version of a web page. Open your VM, and then open the web browser and visit
https://orcid.org/0000-0002-1825-0097
This is (should be?) the only fictional person with an ORCID record. THe page displays his name and some information about him.
Lets look at this under the hood. Make a new browser tab and go to this website:
https://base64.guru/tools/http-request-online
We will first look at HTTP requests with this and then from the command line. Enter the previous ORCID URL into the URL box. Choose HTTP request version 1.1.
We see the request sent.
It has the HTTP method, GET
, and a few headers.
These headers are standard boilerplate.
Then below we see the response headers.
The first line has the response code, in this case "200 OK".
We have some more headers describing the data:
it is text/html
.
There are some other headers, some are important to the client, and some are for debugging.
Below the response headers is the response body, and we have some HTML encoded text which is the displayed webpage. So this shows the distinction between HTTP—the transport protocol—and HTML—the text that forms the "web page".
We can add other headers to our request. Of course, if the server doesn't understand a header it can ignore it or return an error, its choice.
I would then look at this using the JSON response ORCID can provide, but the website now requires a sign-in before providing this. SO, lets look at OpenAlex.
Surprisingly, there is no complete database of all academic scholarship. There are a few aggregators that try to index as much as they can. One is Google Scholar, others are DataCite Commons, and OpenAlex. There are also more specilized databases, such as PubMed for medical research.
OpenAlex is a catalog of open science papers, people, datasets, instituions, and so on. In the browser visit the page:
https://openalex.org/works/w2764299839
This, again, is a human readable page provided by the catalog.
Lets try asking for a JSON representation.
Add the header Accepts: application/json
by typing that into the box labeled "HTTP Request Headers".
This is asking the server that we don't want an HTML page, instead we want a JSON encoded response.
In this case, we get a page that wants us to use javascript.
This seems to be a newer techneque to prevent bots from scraping data off a page.
But the information is all available at the API endpoint:
https://api.openalex.org/works/W2764299839
Now we get an interesting response.
The first line has a 302 response code.
This is the server telling us that we need to retry at a different URL.
The Location:
header is telling us the new URL to use.
Why? It seems to want us to use a capital "W".
The second request returns a JSON response body.
It is all on one line.
Sometimes servers do this, since the line breaks are not needed to decode the JSON.
Thinking of this architecture, why do you think the servers used a redirect rather than just returning the JSON in the first place?
Copy the JSON response and paste it into this web page:
https://jqplay.org/
Paste it into the box labeled "JSON".
In the box labeled "filter" enter .
, a single period.
You will see a formatted version of the JSON appear in the box on the right. JSON is a simple way of structuring data to send between computers. Since it is text-based, it is easy for people to inspect it. However there is no support for comments, so it is not ideal for ongoing things that a human might be editing, such as configuration files.
There are 6 kinds of values in JSON:
- numbers
- strings
- true/false
- null
- objects
- arrays
Most JSON responses are an object, which is indicated by a matching pair of curly braces, {}
.
Inside the curly braces of an object there are a list of
key-value pairs separated by commas.
All of the information in HTML record should also appear in the JSON record.
Try entering .title
in the Filter box.
You should see the following JSON:
"Citizen science provides a reliable and scalable tool to track disease-carrying mosquitoes"
Now try .mesh
.
You should see a big list.
Now do .mesh[3]
:
{
"descriptor_ui": "D009032",
"descriptor_name": "Mosquito Control",
"qualifier_ui": "Q000379",
"qualifier_name": "methods",
"is_major_topic": true
}
The filter box takes a pattern and returns the pieces of the input that match.
MeSH are subject headings curated by the National Library of Medicine. Lets look up this term:
https://www.ncbi.nlm.nih.gov/mesh/
Search for D009032
.
This lets us share topic headings with others and we can all agree on what they mean.
We can also agree on the codes used to represent each topic.
Vocabularies like MeSH are very useful, but each takes effort to develop and there all have a defined scope. Another useful place to define shared terms is WikiData.
https://www.wikidata.org
And we can also find the Wikidata term for MeSH:
https://www.wikidata.org/wiki/Q2003646
Now lets do all this on the command line.
curl -H 'Accepts: application/json' 'https://api.openalex.org/works/w2764299839'
This just returns the redirect. We need to ask "curl" to follow the redirects:
curl -L -H 'Accepts: application/json' 'https://api.openalex.org/works/w2764299839'
We can see more informatio being passed with the -v
"verbose" option.
curl -v -H 'Accepts: application/json' 'https://api.openalex.org/works/w2764299839' 2>&1 | less
Note that the request is on lines starting with a ">" and the response headers are on lines starting with "<".
Lets save the json response:
curl -L -H 'Accepts: application/json' 'https://api.openalex.org/works/w2764299839' > mosq.json
The jq
tool can work on the command line as well.
jq .mesh[3] mosq.json
Software Architecture
- Software Architects: Do We Need 'em by Ruth Malan(by the way, this site has many other excellent articles on software architecture).
- Explaining Software Design
- 5 essential patterns of software architecture
- List of software architecture styles and patterns
- Design Patterns, Architectural Patterns
- 14 software architecture design patterns to know
- Roy Fielding's Misappropriated REST Dissertation is a great read showing how the term REST was appropriated for what we call REST today.
Jeff Bezos on two types of decisions:
Some decisions are consequential and irreversible or nearly irreversible – one-way doors – and these decisions must be made methodically, carefully, slowly, with great deliberation and consultation. If you walk through and don’t like what you see on the other side, you can’t get back to where you were before. We can call these Type 1 decisions. But most decisions aren’t like that – they are changeable, reversible – they’re two-way doors. If you’ve made a suboptimal Type 2 decision, you don’t have to live with the consequences for that long. You can reopen the door and go back through. Type 2 decisions can and should be made quickly by high judgment individuals or small groups.