9.2. Exploring a Legacy Codebase¶
If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
—Rob Pike
The goal of exploration is to understand the app from both the customers’ and the developers’ point of view. The specific techniques you use may depend on your immediate aims:
You’re brand new to the project and need to understand the app’s overall architecture, documenting as you go so others don’t have to repeat your discovery process.
You need to understand just the moving parts that would be affected by a specific change you’ve been asked to make.
You’re looking for areas that need beautification because you’re in the process of port- ing or otherwise updating a legacy codebase.
Just as we explored SaaS architecture in Chapter 3 using height as an analogy, we can follow some “outside-in” steps to understand the structure of a legacy app at various levels:
Check out a scratch branch to run the app in a development environment
Learn and replicate the user stories, working with other stakeholders if necessary
Examine the database schema and the relationships among the most important classes 4. Skim all the code to quantify code quality and test coverage
Since operating on the live app could endanger customer data or the user experience, the first step is to get the application running in a development or staging environment in which perturbing its operation causes no inconvenience to users. Create a scratch branch of the repo that you never intend to check back in and can therefore be used for experimentation. Create a development database if there isn’t an existing one used for development. An easy way to do this is to clone the production database if it isn’t too large, thereby sidestepping numerous pitfalls:
The app may have relationships such as has-many or belongs-to that are reflected in the table rows. Without knowing the details of these relationships, you might create an in- valid subset of data. Using RottenPotatoes as an example, you might inadvertently end up with a
review
whosemovie_id
andmoviegoer_id
refer to nonexistent movies or moviegoers.Cloning the database eliminates possible differences in behavior between production and development resulting from differences in database implementations, difference in how certain data types such as dates are represented in different databases, and so on.
Cloning gives you realistic valid data to work with in development.
If you can’t clone the production database, or you have successfully cloned it but it’s too unwieldy to use in development all the time, you can create a development database by extracting fixture data from the real database5 using the steps in Figure 9.3.
# on production computer:
RAILS_ENV=production rake db:schema:dump
RAILS_ENV=production rake db:fixtures:extract
# copy db/schema.rb and test/fixtures/*.yml to development computer
# then, on development computer:
rake db:create # uses RAILS_ENV=development by default
rake db:schema:load
rake db:fixtures:load
Once the app is running in development, have one or two experienced customers demonstrate how they use the app, indicating during the demo what changes they have in mind (Nierstrasz et al. 2009). Ask them to talk through the demo as they go; although their comments will often be in terms of the user experience (“Now I’m adding Mona as an admin user”), if the app was created using BDD, the comments may reflect examples of the original user stories and therefore the app’s architecture. Ask frequent questions during the demo, and if the maintainers of the app are available, have them observe the demo as well. In Section 9.3 we will see how these demos can form the basis of “ground truth” tests to underpin your changes.
Once you have an idea of how the app works, take a look at the database schema; Fred Brooks,
Rob Pike, and others have all acknowledged the importance of understanding the data
structures as a key to understanding the app logic. You can use an interactive database GUI
to explore the schema, but you might find it more efficient to run rake db:schema:dump
, which
creates a file db/schema.rb
containing the database schema in the migrations DSL introduced
in Section 4.2. The goal is to match up the schema with the app’s overall architecture.
Figure 9.4 shows a simplified Unified Modeling Language (UML) class diagram generated by the
railroady
gem that captures the relationships among the most important classes and the most
important attributes of those classes. While the diagram may look overwhelming initially,
since not all classes play an equally important structural role, you can identify “highly
connected” classes that are probably central to the application’s functions. For example,
in Figure 9.4, the Customer
and Voucher
classes are connected to each other and to many other
classes. You can then identify the tables corresponding to these classes in the database schema.
Having familiarized yourself with the app’s architecture, most important data structures, and
major classes, you are ready to look at the code. The goal of inspecting the code is to get a
sense of its overall quality, test coverage, and other statistics that serve as a proxy for how
painful it may be to understand and modify. Therefore, before diving into any specific file,
run rake stats
to get the total number of lines of code and lines of tests for each file; this
information can tell you which classes are most complex and therefore probably most important
(highest LOC), best tested (best code-to-test ratio), simple “helper” classes (low LOC), and
so on, deepening the understanding you bootstrapped from the class diagram and database schema.
(Later in this chapter we’ll show how to evaluate code with some
additional quality metrics to give you a heads up of where the hairiest efforts might be.) If
test suites exist, run them; assuming most tests pass, read the tests to help understand the
original developers’ intentions. Then spend one hour (Nierstrasz et al. 2009) inspecting the
code in the most important classes as well as those you believe you’ll need to modify (the
change points), which by now you should be getting a good sense of.
Self-Check 9.2.1. What are some reasons it is important to get the app running in development even if you don’t plan to make any code changes right away?
A few reasons include:
For SaaS, the existing tests may need access to a test database, which may not be accessible in production.
Part of your exploration might involve the use of an interactive debugger or other tools that could slow down execution, which would be disruptive on the live site.
For part of your exploration you might want to modify data in the database, which you can’t do with live customer data.