Leverage Power of LLM's to Understand Code Repos

Debarghya Das, former engineer at Google Search (ranking and infra) and Facebook turned investor in AI, SaaS and Infra startups shares his experience how how he leverages LLMs to help him understand GitHub Code Repos quickly. He uses an example repository to demonstrate this, and you can also try this out yourself.

Even after his 10 years of engineering experience, he finds dissecting a large codebase is daunting.

Here’s a quick method using LLMs:
- Dump the code into one big file.
- Feed it to Gemini-1.5Pro (2M context).
- Ask it anything.

Example: DeepFaceLab Deepfake Repo
- Clone the repo and ask Claude to generate bash scripts to dump raw contents into one file.
- Final Dump: Contains filenames and their contents.
- Generate a list of questions: Ask Claude for ways to understand the code repository.
- Feed contents and questions to Gemini 1.5 Pro.

Specifics to Ask:
- Generate a graph of the codebase components and their interactions.
- Draw neural net architectures for specific models.
- Describe each file and its function in a table.
- List external dependencies and their purposes in a table.
- Suggest the future roadmap and needed additions.
- Recommend required reading for better understanding
- Find similar or related open-source repositories

Outcome:
This method is a game-changer for getting up to speed with a new codebase, solving a challenging engineering problem efficiently. AI may overpromise at times, but here it significantly improves a real, hard problem!