GSoC logo


Hey everyone,
GSoC 2024 is now officially over. This blog presents my final report on my project for ‘Git’. Below are some preliminary details:


Summary:

Back in late last year, a new unit testing framework written in C, e137fe3b29 (unit tests: add TAP unit test framework, 2023-11-09), was merged into Git. Before such a framework, we used make a CLI interface for every library/API we wanted to test, and use this interface to call for the output for a given input via shell scripts. This was because end-to-end tests were also written in shell scripts and Git didn’t have any other way for doing testing before this new framework was merged. This unit testing framework is helpful in understanding how each function call behaves and also makes it easy to have complex custom setup for a particular library or a function of a library, which was rather difficult when using shell scripts.

Therefore, my project involved converting these existing shell script based unit tests to the new unit testing framework written solely in C.

Work

I have posted 9 migrations of the tests to mailing list and 7 of them are currently merged into ‘master’ or ’next’. 2 of them are currently under discussion/review on the mailing list. And I’ve done another experimental migration which I will talk about in the Future Work section. Also, I have not only migrated the unit test themselves but also have contributed to other areas as well, like introducing a new library called ’lib-oid’ for the unit tests to use, which generates an object_id with an arbitrary hex string based on the hash algo used. And my Pre GSoC work can be seen listed in my proposal. Below is the list of patches with the commits, mailig list discussion link and their status:

  • t-strcmp-offset

    Status: Merged into Master
    Commits:
    4d00d948ff (t/: port helper/test-strcmp-offset.c to unit-tests/t-strcmp-offset.c, 2024-05-20)
    Mailing list: Thread

  • t-example-decorate

    Status: Merged into Master
    Commits:
    456b4dce4c (t/: migrate helper/test-example-decorate to the unit testing framework, 2024-05-28)
    Mailing list: Thread

  • t-hash

    Status: Merged into Master
    Commits:
    a70f8f19ad (strbuf: introduce strbuf_addstrings() to repeatedly add a string, 2024-05-29)
    2794932548 (t/: migrate helper/test-{sha1, sha256} to unit-tests/t-hash, 2024-05-29)
    Mailing list: Thread

  • t-oidtree

    Status: Merged into Master
    Commits:
    ed54840872 (t/: migrate helper/test-oidtree.c to unit-tests/t-oidtree.c, 2024-06-08)
    Mailing list: Thread

  • t-oidmap

    Status: Merged into Master
    Commits:
    28c1c07700 (t: migrate helper/test-oidmap.c to unit-tests/t-oidmap.c, 2024-07-03)
    Mailing list: Thread

  • t-hashmap

    Status: Merged into Master
    Commits:
    3469a23659 (t: port helper/test-hashmap.c to unit-tests/t-hashmap.c, 2024-08-03)
    Mailing list: Thread

  • t-urlmatch-normalization

    Status: Merged into Master
    Commits:
    05026637f3 (t: migrate t0110-urlmatch-normalization to the new framework, 2024-08-20)
    Mailing list: Thread

  • t-oid-array

    Status: Posted to the list, under review
    Mailing list: Thread

  • t-oidset

    Status: Posted to the list, under review
    Mailing list: Thread

Future Work

Currently t-oid-array and t-oidset are not merged into ‘seen’, however, I am working on getting them merged. However, earlier in the report I mentioned about an experimental migration which I had done. That was t-revision-walking. This unit test uses a repository for making commits and performing a revision walk over them. The one thing which was a bit of a limiting factor for the current framework, which prevented more migrations, was the unavailability of a test repository. Earlier the shell script based tests had a test repository available to them which can be setup using the setup_git_directory() in the ‘C’ part of that test. To solve for this, I had created a new library for the unit tests to use called ’lib-repo’ which created a new repository and populated a ‘struct repository *’ which was passed down to it.

This, however, turned out to be not what I had envisioned. For almost all the tasks (i.e. staging, committing, etc..), I had to spawn git sub-processes, which defeated the purpose of writing it in C. Only for creating and setting up the repo, could I actually use library functions (i.e. init_db():setup.h). And there was also the problem of ’the_repository’ global variable. So, if you do pass down a custom ‘struct repository *’ which is not ’the_repository’, many of the libraries which you would be testing could segfault, cause it is used in many places implicitely. This occured to me when migrating ‘helper/test-revision-walking’. So, a good feature to work on in the future would be to provide a test repository to the unit tests.

Final Words

Overall, I would say that it was quite a transformative experience, from not even having the knowledge of what a mailing list even was, to being on it and participating on it on a regular basis, and contributing to one of the most important open-source projects was really a great experience.

I had faced some difficulties when I started to contribute to ‘Git’. The most challanging of them all was finding the work to do. I mostly just searched the codebase for ‘TODO:’, ‘FIXME:’ and other markers and occasionally #leftoverbits on the list, but those provided very little context of the problem. I was used to GitHub issues, where you can filter them based on the difficulty, areas of the codebase affected by the issue, etc… The unorganized nature of ‘Git’ was rather difficult at first and I thought that only the people working on ‘Git’ as part of their $DAYJOB would have the clear understanding of what long term technical goal they wanted to achieve while contributing. And I still somewhat think the same today.

I also wanted to thank my mentors Christian and Kaartic for helping me during this process and all the others who have reviewed my code or interacted with me on the list. I have a $DAYJOB now, which can be attributed to my participation in GSoC and working on ‘Git’. I still plan to help out beginners who want to contribute to ‘Git’, however, I will probably produce less code than I used to for a month or two while navigating my first $DAYJOB. Although, I will still get t-oid-array and t-oidset merged and occasionally participate on the list. I am particularly interested in the Rust’s integration in the ‘Git’ codebase, which is under discussion now.

Thanks & Regards,
Ghanshyam.