This week I learned 4 February 22, 2025
Analyzing code, beauty and questions
“Readable code”
This is a pet peeve of mine by now.
Arguing that one piece of code is more “beautiful”, “readable” or “clean”, than another is not an argument. These terms are meaningless because their meaning varies from person to person, depends on their experience with different patterns and programming languages, and because they exist on a spectrum.
At the extremes of this spectrum, we can probably all agree - for example, a single line containing five nested ternary operators including some bit shifting is neither “clean” nor “beautiful”. However, as you move toward the middle of the spectrum, things become progressively muddier.
That’s why I would be very happy if we instead focused on identifying technical advantages or disadvantages with an approach.
- Is it simpler to change in the future?
- Is it more performant?
- Does it use outdated APIs?
- What are contexts/use cases for an API you are designing? Are 80% of the usages simple or are you making the callers life difficult?
- Will the caller fall into the pit of success or shoot themselves in the foot? How can you avoid that?
Code analysis
What are you looking for, when you are tasked with analyzing a program, you have never seen before?
These are the things, that I can think of.
Credentials in code
Credentials should be provided via environment variables, config files or some database values. If not, these are the problems:
- You might accidentally share access to your production data.
- Switching between environments will happen by commenting out code. This increases the probability of accidentally committing the wrong environment and could mean operating on the wrong data in production or on another developers machine.
To load environment variables from a file, there are libraries like
dotenv in most programming languages.
Memory leaks/unclosed connections
Memory leaks increase memory consumption of a long running service until they blow up. However, how exactly you can provoke this depends on the programming language.
def get_data():
conn = create_db_conn()
# Do stuff with conn
In Java, this kind of code, would cause memory leaks sooner or later, however in Python it might not.
The reference implementation CPython uses reference counting to
dispose of values, that are no longer referenced from anywhere in the
program. Same goes for Swift and Rust (Rc<T> or Arc<T>). This
means, when no code references conn anymore its __del__ method
will be called and it will be collected, eventually. Therefore, if
your database library of choice calls self.close() in the __del__
method, you should not leak connections and memory, eventually.
Eventually is doing a lot of the heavy lifting here, because it is not
deterministic when garbage collection runs. That means, it is always
better to explicitly dispose of connections or closeable things via
with if a contextmanager exists or try/finally.
def get_data():
with create_db_conn() as conn:
# Do stuff with conn
SQL injections
Well, not much to say here. If you find something like the following code, you probably have a problem. Use prepared statements - always!
@app.get("/")
def home(id: str):
query = f"SELECT * FROM person WHERE id = '{id}'"
with create_db_conn() as conn:
with conn.cursor() as cursor:
cursor.execute(query)
Concurrency issues
If your language supports coroutines and cooperative scheduling, you need to make sure, that you are not using some synchronous method somewhere, that will block your whole event loop or a worker thread.
Code issues
This is much more lose, than the other points, but there are two things, that I find invaluable.
One is to always think about side-effects (any IO, e.g. database, network or file access). Consider the following function.
def calc_complex_things(conn):
data = get_data_from_database(conn)
return data.x * data.y * data.z
This ties the actual calculation with how the data is retrieved. What if we separated out how the data is retrieved?
def calc_complex_things(data):
return data.x * data.y * data.z
Now it is possible to test the actual calculation without any external
setup, which makes testing much faster and also allows us to use
something like hypothesis to do property based testing. Also, this
function can be used in a variety of contexts without accidentally
overloading the database, because it was called in a hot loop.
The other thing is something like this:
def foo(x: int):
if x == 5:
y = Person(name="Test")
else:
print("Wohooo!")
# 100 lines later, that do not use y
z = calc_z()
if z > 0:
# implicitly means x == 5, therefore this is ok
print(y.name)
This might all work, but it will blow up at some point, because at
that point z > 0 might not imply x == 5 anymore. y should be
declared closely to where it is used.
Notice, how I didn’t call this “not beautiful” or “not clean” but instead focused on potential downsides.
Architecture
I think architecture
Question feature requests
When I was younger and tasked with implementing some features, I often tried to just implement what made sense to me. Over the years, that approach has changed.
Oftentimes, most feature requests I receive are not sufficiently defined. Therefore, I almost always need to ask what really needs to be done or clarify something. The lesson here is, that what makes sense to me, often does not make sense in a larger context, that I don’t have. So ask questions, even if you think, you know what you should do.