Best Practices When Programming (with code)
Many enjoy programming and solving algorithmic mazes in their favorite language, but what nobody likes is diving into old code or even worse, code written by someone else.
I’ve had to do it several times, sometimes it was bad, other times worse, many times I end up rewriting part of the code that makes my eyes bleed, and that’s why I’ve learned some best practices when programming.
Warning: This comes from a Data Scientist’s perspective, so these cases generally apply to ETL and Machine Learning models.
💭 Think Before You Code
Many times I’ve seen that when people receive a task, they sit down and start pressing keys trying to accomplish what was requested, and that generally produces very bad code.
The lines of code written reflect the logic with which the developer solved the problem, so if they don’t have this logic clear, it will be a difficult thread to follow.
This will help you name variables, write comments and divide the code into parts.
▶️ Code Should Run Sequentially
With the appearance of interactive languages like Python, R and Julia among others, I’ve seen many times scripts that to be executed must be run in pieces, something like: run from line 10 to 35, then run lines 1 and 2, then take the result from that line and paste it into line 40.
That’s not programming, it’s spreadsheet-ing in R or Python.
🔄 If It’s an ETL, Give It an ETL Structure
Try to use predefined structures and when processing data, many of those processes will be ETL. If possible, follow these steps:
- Environment configuration script
- Libraries
- Load Data
- Manipulate data
- Save the result
✂️ Divide the Code Into Parts
Try to have each script execute a particular task, saving the result in some storage for a next stage, which will only have in common that it reads the previously saved data.
🏷️ Give Variables Good Names
It will help you return to the code in the future:
- Name functions as verbs that describe what they do
- Name objects with nouns that describe them
- If you have an object that represents a big apple, name it
big_apple. If you don’t like that standard, use another likebigApple, but be consistent - If you have tables with 2 columns that are the same value, give them the same name
# ❌ Bad
def data(x):
return x * 0.16
t = data(100)
# ✅ Good
def calculate_tax(price):
return price * 0.16
total_tax = calculate_tax(100)
🔤 Don’t Use Non-ASCII Characters
Don’t create variables with characters like ñ or áéíóú or even spaces. At some point it will be a problem.
# ❌ Bad
año_fiscal = 2024
próximo_año = 2025
# ✅ Good
fiscal_year = 2024
next_year = 2025
🗂️ Use Version Control
You don’t want to lose all your work, or imagine that at some point you made a mistake and want to go back. Back up your work in something like GIT, there are very good online courses.
# Basic commands
git init
git add .
git commit -m "Add tax calculation function"
git push origin main
📊 Work with Data in Tidy Format
New tools for manipulating data are focused on doing so with data in tidy format or "long" tables.
❌ Bad – Wide format
| person | weight_2020 | weight_2021 |
|---|---|---|
| Juan | 70 | 72 |
✅ Good – Tidy/Long format
| person | year | weight |
|---|---|---|
| Juan | 2020 | 70 |
| Juan | 2021 | 72 |
🎨 Data Formatting Goes at the End
It is recommended to use objects and/or variables in their native format in the language and leave the formatting for the human eye until the end.
# ❌ Bad - Formatting too early
tax_rate = "16%"
result = float(tax_rate.replace("%","")) / 100 * price # unnecessary conversion
# ✅ Good - Work in native format
TAX_RATE = 0.16
result = TAX_RATE * price
print(f"Tax: {result * 100:.1f}%") # format only at display time
Same with dates:
from datetime import date
# ✅ Work as Date object
transaction_date = date(2024, 1, 15)
# Format only when displaying
print(transaction_date.strftime("%d-%m-%Y")) # 15-01-2024
📋 Generate an Execution Log
With a try and catch you can generate a log, for example in R:
#!/bin/Rscript
source("/opt/config/init.R") # script containing log function and several other things
tryCatch({
# generally... process identified by destination table
param = c(
proc = 'process name'
)
### do what's necessary here....
write_log(param[['proc']], message = file)
}, error = function(e) {
write_log(param[['proc']], exit_code = -1, message = paste(e$message, collapse = " "))
print(e$message)
}
)
Or in Python:
import logging
logging.basicConfig(filename='execution.log', level=logging.INFO)
try:
# main process
logging.info("Process started successfully")
except Exception as e:
logging.error(f"Error in process: {str(e)}")
raise
⚙️ Parameters Don’t Belong in the Code
The code can have many parameters, but execution parameters, for example "execution date" or input file path, are better off being parametric.
It can’t be that to change an execution parameter it’s necessary to modify the code. These should be passed through the console or through some web interface:
In R:
args = commandArgs(trailingOnly = TRUE)
In Python:
import sys
args = sys.argv[1:]
file = args[0]
Even better, using argparse in Python:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--date', default='2024-01-01', help='Execution date')
parser.add_argument('--input', required=True, help='Input file path')
args = parser.parse_args()
print(f"Processing file: {args.input} for date: {args.date}")
🤔 Final Question 😊
What did you think? Can you think of any more?
Leave it in the comments.
Cheers! 🚀

Leave a Reply