Designing Rust bindings for REST APIs

REST API binding crates are numerous, but the design patterns could be improved

I co-maintain the gitlab crate for Rust which helps to communicate with GitLab’s API. In the course of trying to expose such a large API which changes over time, I’ve learned a few things about how to expose such REST APIs into Rust idiomatically.

Just a note that in this post, docstrings and #[derive] attributes are largely stripped to keep the examples concise. For the full code, please see the gitlab crate’s api module.

In the beginning…

The first approach that had been used was to have a single Gitlab type with a bunch of methods on it to expose the API endpoints that were wanted in various use cases. This usually involved exposing some useful parameters and returning a concrete type:

impl Gitlab {
    pub fn users(&self) -> Result<Vec<User>, GitlabError> {
        // ...
    }

    pub fn user_by_name(&self, name: &str) -> Result<User, GitlabError> {
        // ...
    }
}

This works…fine as long as you can count the number of distinct calls you need on your fingers and toes. Beyond such simple use cases, it is quickly realized that there are problems with this kind of representation in Rust:

  • the /users endpoint accepts parameters for filtering, sorting, etc.
  • the type returned depends on the access rights of the logged in user (namely, administrators can see the email address)
  • pagination options are hard to represent in such a call

One of the first things done was to handle the variations in the result type by using a trait for the result type:

trait UserResult: DeserializeOwned {}

impl Gitlab {
    pub fn users<U>(&self) -> Result<Vec<U>, GitlabError>
    where
        U: UserResult,
    {
        // ...
    }

    pub fn user_by_name<U>(&self, name: &str) -> Result<Vec<U>, GitlabError>
    where
        U: UserResult,
    {
        // ...
    }
}

This works, but is now getting wordy even now. Eventually, other developers started using the crate and needed to pass other parameters to the endpoints that were already exposed. After some playing around with &[(&str, &str)] arguments for a little bit, this simple approach ended up being a dead end. Primarily, it forces allocation for anything that isn’t a Vec<(&str, &str)> or a literal in the code because the parameters are forced to be contiguous and references to str. Better is to instead take an iterator that we can coerce into what we need internally:

impl Gitlab {
    pub fn users<T, I, K, V>(&self, params: I) -> Result<Vec<T>, GitlabError>
    where
        T: UserResult,
        I: IntoIterator,
        I::Item: Borrow<(K, V)>,
        K: AsRef<str>,
        V: AsRef<str>,
    {
        // ...
    }
}

This is suitable for doing whatever one wants with the /users endpoint, but it now has four generic types and callers need type annotations for the simple &[] parameter case. And this endpoint doesn’t even take in any forced parameters (such as a project or merge request ID).

A new approach is needed

The above API works in that it does the job, but is deficient in a number of ways.

  • It is extremely generic which requires explicit type annotations for the common case
  • Paged endpoints forced pagination on the caller (it returns all results and will not return until all of them have been fetched)
  • It is forced synchronous (which was fine at the time, but with async becoming a thing, something had to be done)
  • The result type was fixed in a large number of cases resulting in either large structs or missing fields for some use cases
  • It’s not very Rust-like:
    • Why are all the parameters strings?
    • What parameters are supported?
    • What escaping, if any, is necessary for values of params?

After adding “just one more” endpoint exposure got to be too much to handle, I went on a search to see how this was done elsewhere. I came across two main patterns in the wild:

  • Using traits for collections of endpoints which is then implemented by the client type
  • Datatypes with methods for the endpoints that use them

I found both of these to be unsuitable for GitLab’s API. Namely because the crate would still need to ship struct definitions for types returned by the endpoint which had gotten to be a maintenance chore. This is because GitLab likes adding new fields to REST endpoints all the time. This presents a problem with how to update structures when the API adds new fields in a new version. I could use Option to represent that this field might not be present in a given GitLab deployment and gracefully support old deployments which don’t support the field, but this is conflated with the ability to return null in the JSON object as well. If the type is actually never null, this is also very unfortunate on the consuming side (if the field is even cared about).

Pagination is also not always considered as well. Endpoints are either paged or not paged and if you want just one or two pages of results, manual parameter updating was usually required.

Additionally, existing crates tend to either be sync or async and not support the other. For example, the gitea crate is currently async-only which is annoying for sync code to have to spin up an executor of some kind for each call (for reference, the old gitlab crate’s API was sync-only).

Insight and a solution

After dwelling on the problem for a bit, I had an insight that the endpoints and the client do not actually need to be tied together so intimately. Even the return types were not that important. Instead, the idea is to expose each endpoint as its own type, provide combinator functions for how to use these endpoints via a client. The core traits here are Endpoint, Client, and Query.

trait Endpoint {
    fn method(&self) -> Method;
    fn endpoint(&self) -> Cow<'static, str>;
    fn parameters(&self) -> QueryParams {
        QueryParams::default() // Many endpoints don't have parameters
    }
    fn body(&self) -> Result<Option<(&'static str, Vec<u8>)>, BodyError> {
        Ok(None) // Many endpoints also do not have request bodies
    }
}

trait Client {
    type Error: Error + Send + Sync + 'static;
    fn rest_endpoint(&self, endpoint: &str) -> Result<Url, ApiError<Self::Error>>;
    fn rest(&self, request: RequestBuilder, body: Vec<u8>) -> Result<Response<Bytes>, ApiError<Self::Error>>;
}

trait Query<T, C> {
    fn query(&self, client: &C) -> Result<T, ApiError<C::Error>>;
}

Some notes on types which are not shown explicitly here:

  • These traits work on generic http crate types to avoid tying themselves to any specific HTTP client implementation (e.g., reqwest)
  • ApiError is an enum with variants for errors related to the API. There’s a variant for client errors which carries the Client::Error
  • The client only knows about bytes and Query is in charge of converting it into the type via serde
  • QueryParams is a type to help manage query parameters with some useful methods for handling optional parameters (rather than having to do opt_param.map(|(k, v)| params.push(k, v)))

That’s all that’s needed. Consumers of these traits can take any endpoint and give them to any client given a Query implementation. Luckily, we can provide a Query for any Endpoint to return any type T that can be deserialized:

impl<E, T, C> Query<T, C> for E
where
    E: Endpoint,
    T: DeserializeOwned,
    C: Client,
{
    fn query(&self, client: &C) -> Result<T, ApiError<C::Error>> {
        // compute the URL
        // add query parameters and the body to the request
        // send off the request
        // check the response status and extract errors if needed

        serde_json::from_value::<T>(v).map_err(ApiError::data_type::<T>)
    }
}

This works wonderfully and keeps Endpoints from caring about the Client and Client from caring about the details of crafting the right request itself. Basically, the Endpoint is what we’re talking to and the Client knows the how. Later, @boxdot came along and added AsyncQuery and AsyncClient traits for performing these queries using async and .await. The duplicated code is just the above with a .await added in one place to make it all work asynchronously. This allows callers to use whichever is more convenient while not having to reimplement each endpoint for both flavors.

Combinators as extensions

As you may have noticed, the above Query implementation doesn’t handle pagination and requires that the returned data be turned into a concrete type. Previously, I mentioned that combinator functions were possible with this design and these are how these use cases are implemented. First, there are two trivial combinators: Ignore drops the response data and Raw returns the raw bytes of it:

pub struct Ignore<E> {
    endpoint: E,
}

impl<E, C> Query<(), C> for Ignore<E>
where
    E: Endpoint,
    C: Client,
{
    fn query(&self, client: &C) -> Result<(), ApiError<C::Error>> {
        // as in `impl Query for Endpoint` except we throw away the response data
    }
}

This type wraps an Endpoint and implements Query itself to force returning nothing (()) to save the deserialization step. It also nicely marks in the calling code that we’re explicitly ignoring the returned data. Very similar is the raw combinator:

pub struct Raw<E> {
    endpoint: E,
}

impl<E, C> Query<Vec<u8>, C> for Raw<E>
where
    E: Endpoint,
    C: Client,
{
    fn query(&self, client: &C) -> Result<Vec<u8>, ApiError<C::Error>> {
        // as in `impl Query for Endpoint` except the response data is returned directly (errors are still JSON objects)
    }
}

The only difference here is that the data is returned as raw bytes. This is usually used when fetching the data behind a resource (e.g., a file from the backing git repository). Additionally, note that api::ignore(api::raw(endpoint)) is not valid because Raw does not impl Endpoint. This means that nonsensical combinations are not possible and don’t need to be considered in the logic.

There’s one more combinator that is of interest for GitLab that allows administrators to call APIs as if they were another user.

pub struct SudoContext<'a> {
    sudo: Cow<'a, str>,
}

impl<'a> SudoContext<'a> {
    // Constructor
    // pub fn new<S>(sudo: S) -> Self

    // Apply to an endpoint
    // pub fn apply<E>(&self, endpoint: E) -> Sudo<'a, E>
}

pub struct Sudo<'a, E> {
    endpoint: E,
    sudo: Cow<'a, str>,
}

impl<'a, E> Endpoint for Sudo<'a, E>
where
    E: Endpoint,
{
    fn method(&self) -> Method {
        self.endpoint.method()
    }

    fn endpoint(&self) -> Cow<'static, str> {
        self.endpoint.endpoint()
    }

    fn parameters(&self) -> QueryParams {
        let mut params = self.endpoint.parameters();
        params.push("sudo", &self.sudo);
        params
    }

    fn body(&self) -> Result<Option<(&'static str, Vec<u8>)>, BodyError> {
        self.endpoint.body()
    }
}

impl<'a, E> Pageable for Sudo<'a, E>
where
    E: Pageable,
{
    fn use_keyset_pagination(&self) -> bool {
        self.endpoint.use_keyset_pagination()
    }
}

There are things to notice about the usage here. First is that there is a SudoContext type which allows for modifying existing endpoints and that the Sudo type just modifies an endpoint, so there is no need to implement Query specifically. This also means that it combines with the other combinators gracefully. At the end, you can see the Pageable trait which is used to handle pagination.

Pagination

Pagination is a common feature in REST APIs and every service ends up doing it differently. GitLab has two methods for handling pagination: page numbers and keyset pagination. However, keyset pagination is only available on certain endpoints in certain modes, so there is a trait method to know whether to use it or not:

pub trait Pageable {
    fn use_keyset_pagination(&self) -> bool {
        false
    }
}

Before we can make queries for pages, we need to know how much we’re going to fetch. For this, there is the Pagination type:

pub enum Pagination {
    All,
    Limit(usize),
}

impl Default for Pagination {
    fn default() -> Self {
        Pagination::All
    }
}

With these two types, we can finally provide a Paged combinator type:

pub struct Paged<E> {
    endpoint: E,
    pagination: Pagination,
}

impl<E, T, C> Query<Vec<T>, C> for Paged<E>
where
    E: Endpoint,
    E: Pageable,
    T: DeserializeOwned,
    C: Client,
{
    fn query(&self, client: &C) -> Result<Vec<T>, ApiError<C::Error>> {
        // as in `impl Query for Endpoint` except there's a loop to fetch multiple pages
    }
}

Here, we require that in order to query a type, we have an endpoint that is also Pageable. This ensures that we do not paginate an endpoint that is not intended to be paged. It also forces the result to be a Vec<T> since we’re getting a sequence of result objects of a given type.

Implementing an endpoint

All of that is great, but how are endpoints actually defined in this system? Since they’re only interacted with via a trait so far, the crate itself does not have to provide every endpoint. Instead, consumers can define their own endpoint while they submit it upstream instead of having to wait for an official release in order to use the API. The gitlab crate itself defines endpoints using the derive_builder crate:

#[derive(Debug, Builder)]
#[builder(setter(strip_option))]
pub struct EditIssueNote<'a> {
    #[builder(setter(into))]
    project: NameOrId<'a>,
    issue: u64,
    note: u64,

    #[builder(setter(into))]
    body: Cow<'a, str>,
    #[builder(default)]
    confidential: Option<bool>,
}

impl<'a> Endpoint for EditIssueNote<'a> {
    fn method(&self) -> Method {
        Method::PUT
    }

    fn endpoint(&self) -> Cow<'static, str> {
        format!(
            "projects/{}/issues/{}/notes/{}",
            self.project, self.issue, self.note,
        )
        .into()
    }

    fn body(&self) -> Result<Option<(&'static str, Vec<u8>)>, BodyError> {
        let mut params = FormParams::default();

        params
            .push("body", self.body.as_ref())
            .push_opt("confidential", self.confidential);

        params.into_body()
    }
}

Here, the nice API benefits of the Endpoint trait are shown. The endpoint doesn’t have query parameters, so the default implementation is suitable, but we do have a the body() method which uses the FormParams type to build up a x-www-form-urlencoded body to edit an issue comment. The builder type on the EditIssueNote itself ensures that we have the required parameters provided while also allowing for setting the optional parameters in a type-safe manner. Creating such an endpoint reads quite well too:

let endpoint = EditIssueNote::builder()
    .project("my/project")
    .issue(50)
    .note(5000)
    .confidential(true) // make the note confidential
    .build()
    .unwrap();

Note that in all this time, there has been no mention of how the result of such an endpoint is returned. This is an explicit design decision because callers can define their own structs to deserialize into and pull out only the fields that are wanted in a way that is useful for them. There won’t be any failures to deserialize fields that aren’t cared about, API version support can be handled as the caller wants to, and unimportant data is dropped as soon as possible to help alleviate unnecessary allocations. Here is some example code which uses the above endpoint to edit the issue note:

// Get the returned object (the issue note as edited)
let result: MyIssueNoteRepr  = endpoint // call the endpoint directly
    .query(&client) // use `client` to perform the request
    .unwrap(); // expect success

// Or ignore explicitly the result
api::ignore(endpoint) // ignore the result and save the JSON deserialization step
    .query(&client) // use `client` to perform the request
    .unwrap(); // expect success (possible errors include the project, issue, or note not existing, permission errors, etc.)

Testing

Another benefit of this API pattern is that it allows for testing the endpoints themselves very easily. Since the Client is a trait and not a type itself, we can have create mock Client implementations which ensures that the endpoint is sending the right data and test that each parameter is properly passed and escaped as needed. The crate’s test suite includes implementations of clients that also use the derive_builder crate to set the expectations of the test before giving back a Client for use in the test. This has caught real problems and typos in the endpoint implementations and additionally doesn’t require standing up an entire GitLab instance to test against either.

Other details

This post mainly covers over the high-level view of the design of REST endpoints for the gitlab crate. For more details including things like the ParamValue trait for query and form parameters, NameOrId for fields which support names or numerical IDs, CommaSeparatedList for values which are comma-separated, and more, please see the source code. I may write more about them in future posts as well.