Best Practices When Programming (with code)

Best Practices When Programming (with code)

Many enjoy programming and solving algorithmic mazes in their favorite language, but what nobody likes is diving into old code or even worse, code written by someone else.
I’ve had to do it several times, sometimes it was bad, other times worse, many times I end up rewriting part of the code that makes my eyes bleed, and that’s why I’ve learned some best practices when programming.

Warning: This comes from a Data Scientist’s perspective, so these cases generally apply to ETL and Machine Learning models.


💭 Think Before You Code

Many times I’ve seen that when people receive a task, they sit down and start pressing keys trying to accomplish what was requested, and that generally produces very bad code.

The lines of code written reflect the logic with which the developer solved the problem, so if they don’t have this logic clear, it will be a difficult thread to follow.

This will help you name variables, write comments and divide the code into parts.


▶️ Code Should Run Sequentially

With the appearance of interactive languages like Python, R and Julia among others, I’ve seen many times scripts that to be executed must be run in pieces, something like: run from line 10 to 35, then run lines 1 and 2, then take the result from that line and paste it into line 40.

That’s not programming, it’s spreadsheet-ing in R or Python.


🔄 If It’s an ETL, Give It an ETL Structure

Try to use predefined structures and when processing data, many of those processes will be ETL. If possible, follow these steps:

  1. Environment configuration script
  2. Libraries
  3. Load Data
  4. Manipulate data
  5. Save the result

✂️ Divide the Code Into Parts

Try to have each script execute a particular task, saving the result in some storage for a next stage, which will only have in common that it reads the previously saved data.


🏷️ Give Variables Good Names

It will help you return to the code in the future:

  1. Name functions as verbs that describe what they do
  2. Name objects with nouns that describe them
  3. If you have an object that represents a big apple, name it big_apple. If you don’t like that standard, use another like bigApple, but be consistent
  4. If you have tables with 2 columns that are the same value, give them the same name
# ❌ Bad
def data(x):
    return x * 0.16

t = data(100)

# ✅ Good
def calculate_tax(price):
    return price * 0.16

total_tax = calculate_tax(100)

🔤 Don’t Use Non-ASCII Characters

Don’t create variables with characters like ñ or áéíóú or even spaces. At some point it will be a problem.

# ❌ Bad
año_fiscal = 2024
próximo_año = 2025

# ✅ Good
fiscal_year = 2024
next_year = 2025

🗂️ Use Version Control

You don’t want to lose all your work, or imagine that at some point you made a mistake and want to go back. Back up your work in something like GIT, there are very good online courses.

# Basic commands
git init
git add .
git commit -m "Add tax calculation function"
git push origin main

📊 Work with Data in Tidy Format

New tools for manipulating data are focused on doing so with data in tidy format or "long" tables.

📖 Tidy Data – Wikipedia

❌ Bad – Wide format

person weight_2020 weight_2021
Juan 70 72

✅ Good – Tidy/Long format

person year weight
Juan 2020 70
Juan 2021 72

🎨 Data Formatting Goes at the End

It is recommended to use objects and/or variables in their native format in the language and leave the formatting for the human eye until the end.

# ❌ Bad - Formatting too early
tax_rate = "16%"
result = float(tax_rate.replace("%","")) / 100 * price  # unnecessary conversion

# ✅ Good - Work in native format
TAX_RATE = 0.16
result = TAX_RATE * price
print(f"Tax: {result * 100:.1f}%")  # format only at display time

Same with dates:

from datetime import date

# ✅ Work as Date object
transaction_date = date(2024, 1, 15)

# Format only when displaying
print(transaction_date.strftime("%d-%m-%Y"))  # 15-01-2024

📋 Generate an Execution Log

With a try and catch you can generate a log, for example in R:

#!/bin/Rscript
source("/opt/config/init.R") # script containing log function and several other things
tryCatch({
  # generally... process identified by destination table
  param = c(
    proc = 'process name'
  )

  ### do what's necessary here....

  write_log(param[['proc']], message = file)

}, error = function(e) {
  write_log(param[['proc']], exit_code = -1, message = paste(e$message, collapse = " "))
  print(e$message)
  }
)

Or in Python:

import logging

logging.basicConfig(filename='execution.log', level=logging.INFO)

try:
    # main process
    logging.info("Process started successfully")

except Exception as e:
    logging.error(f"Error in process: {str(e)}")
    raise

⚙️ Parameters Don’t Belong in the Code

The code can have many parameters, but execution parameters, for example "execution date" or input file path, are better off being parametric.

It can’t be that to change an execution parameter it’s necessary to modify the code. These should be passed through the console or through some web interface:

In R:

args = commandArgs(trailingOnly = TRUE)

In Python:

import sys
args = sys.argv[1:]
file = args[0]

Even better, using argparse in Python:

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--date', default='2024-01-01', help='Execution date')
parser.add_argument('--input', required=True, help='Input file path')
args = parser.parse_args()

print(f"Processing file: {args.input} for date: {args.date}")

🤔 Final Question 😊

What did you think? Can you think of any more?
Leave it in the comments.

Cheers! 🚀

Be the first to comment

Leave a Reply

Your email address will not be published.




This site uses Akismet to reduce spam. Learn how your comment data is processed.