Normalize Data with Rails 7.1

railsAugust 01, 2023Dotby Alkesh Ghorpade

Storing user input directly in a database can be a security risk. Malicious users can inject harmful code into the database, which could lead to data breaches or other problems. To protect your database, you should sanitize (remove HTML tags from user input to prevent cross-site scripting attacks) or standardize (convert the data into a consistent format) the data before saving it.

Before Rails 7.1

Before Rails 7.1, to ensure data is correctly formatted and sanitized, before_save or before_validation callbacks can be used. The most basic use case for such scenarios is storing the user's email address in the Rails application.

1. Using before_save or before_validation callbacks

With before_save or before_validation you can implement the normalization flow as below:

class User < ApplicationRecord
  before_save :sanitize_email

  private

  def sanitize_email
    email.strip.downcase
  end
end

## or

class User < ApplicationRecord
  before_validation :sanitize_email

  private

  def sanitize_email
    email.strip.downcase
  end
end

2. Using attribute setter

Another way to normalize the data is to use the setter methods as follows:

class User < ApplicationRecord
  def email=(value)
    self.email = value.strip.downcase
  end
end

3. Using normalize gem

If you want to avoid using the above two methods, you can use the normalize gem. As the gem README explains, you must create an app/normalizers directory in your Rails application. The directory will contain your normalization classes. For the above use case, an email normalizer class can be created as below:

class EmailNormalizer
  def self.call(email)
    email.strip.downcase
  end
end

You need to specify the EmailNormalizer to run on the email attribute in your User model as below:

class User < ApplicationRecord
  normalize :email, with: EmailNormalizer
end

In Rails 7.1

Rails 7.1 adds normalizes method to ActiveRecord::Base, which can be used to declare normalization for attribute values. The method can be used to sanitize user inputs, enforce consistent formatting and clean up data from external sources.

The normalizes method takes two arguments:

  • name of the attribute to be normalized
  • block that defines the normalization logic. The block can contain any Ruby code you want to run on the attribute value before it is saved to the database.

For example, the following code normalizes the email attribute by downcasing it:

class User < ApplicationRecord
  normalizes :email, with: -> email { email.downcase.strip }
end

> user = User.create(email: "\n  sAm@SAmpLE.CoM")
> user.email
 => "sam@sample.com"

When a new user is created, or an existing user's email address is updated, the normalizes block will run on the email address before it is saved to the database. The email address will be downcased and then saved.

Normalize multiple attributes

The normalizes method can also be used to normalize multiple attributes at once. For example, the following code normalizes the email and the username attributes:

class User < ActiveRecord::Base
  normalizes :email, :username, with: -> attribute { attribute.strip.downcase }
end

Normalize nil values

By default, normalization is not applied to nil values. If you pass the username as nil, assuming it is not mandatory, it will be set to nil. This means normalization code is not executed for nil values else nil.strip will raise NoMethodError: undefined method 'strip' for nil:NilClass.

> user = User.create(email: "\n  sAm@SAmpLE.CoM", username: nil)
> user.email
 => "sam@sample.com"

> user.username
 => nil

The nil behaviour can be changed by setting apply_to_nil to true. apply_to_nil is false by default.

class User < ActiveRecord::Base
  normalizes :username, with: -> username { username&.downcase&.titleize || 'No username' }, apply_to_nil: true
end

> user = User.create(email: "\n  sAm@SAmpLE.CoM", username: nil)
> user.email
 => "sam@sample.com"

> user.username
 => "No username"

Normalization process

Normalization is applied when the attribute is assigned or updated. The normalization is also applied to the corresponding keyword argument of finder methods. This enables the creation of a record and subsequent querying using unnormalized values.

> User.create!(email: "\n  sAm@SAmpLE.CoM")

#<User:0x000000001x987261 id: 1, email: "sam@sample.com"
created_at: Fri, 11 June 2023 00:00:20.058984000 UTC +00:00,
updated_at: Fri, 11 June 2023 00:00:20.058984000 UTC +00:00>

> user = User.find_by!(email: "\n  sAm@SAmpLE.CoM")
> user.email
 => "sam@sample.com"

> User.exists?(email: "\n  sAm@SAmpLE.CoM")
 => true

Note: Normalization will not be applied when you pass the attribute value in the raw query.

> User.exists?(["email = ?", "\n  sAm@SAmpLE.CoM"])
 => false

Normalize existing records

If a user's email were already stored in the database before normalization was added to the model, the email would not be retrieved in the normalized format.

## user created with email "sAm@SAmpLE.CoM"

> user = User.find(1)
> user.email
 => "sAm@SAmpLE.CoM"

This means for existing records; the attributes won't be normalized. You can normalize it explicitly using the Normalization#normalize_attribute method.

class User < ActiveRecord::Base
  normalizes :email, with: -> email { email&.downcase&.strip }
end

### Migration to normalize existing records
User.find_each do |legacy_user|
  legacy_user.normalize_attribute(:email)
  legacy_user.save
end

Normalize using a class method

You can call the normalize method at the class level to normalize the attribute's value as below:

> User.normalize(:email, "\n  sAm@SAmpLE.CoM")
 => "sam@sample.com"

To know more about this feature, please refer to this PR.

Closing Remark

Could your team use some help with topics like this and others covered by ShakaCode's blog and open source? We specialize in optimizing Rails applications, especially those with advanced JavaScript frontends, like React. We can also help you optimize your CI processes with lower costs and faster, more reliable tests. Scraping web data and lowering infrastructure costs are two other areas of specialization. Feel free to reach out to ShakaCode's CEO, Justin Gordon, at justin@shakacode.com or schedule an appointment to discuss how ShakaCode can help your project!
Are you looking for a software development partner who can
develop modern, high-performance web apps and sites?
See what we've doneArrow right