Golang notes no. 2

May 22, 2023

sql/db

when you're using the sql.DB abstraction in Go to interact with a database, it takes care of managing concurrent access to the underlying datastore for you. When you use a connection to perform a task, it's marked as "in-use" and then returned to the available pool when you're done with it. But here's the catch: if you forget to release the connections back to the pool, you can end up with a ton of open connections, which can lead to resource issues like too many connections, open file handles, or lack of available network ports.

Now, here's something that might surprise you. When you call sql.Open() to establish a connection to the database, it doesn't actually open any connections or validate the connection parameters right away. Instead, it prepares the database abstraction for later use. The actual connection to the database will be established lazily when it's needed for the first time. So, if you want to check immediately whether the database is accessible, you can use db.Ping(). It helps you validate the connection and ensure everything is in order. Just remember to check for errors like this:

err = db.Ping()
if err != nil {
	// handle the error here
}

Now, here's an important tip. Although it's a good practice to close the database when you're done with it, the sql.DB object is designed to be long-lived. So, you shouldn't open and close databases frequently. Instead, create just one sql.DB object for each distinct datastore you need to access and keep it open until your program is finished with that datastore. You can pass it around as needed or make it available globally, but keep it open. And avoid opening and closing the database within short-lived functions. Instead, pass the sql.DB object as an argument to those functions.

If you don't treat the sql.DB as a long-lived object, you might run into problems like poor connection reuse, running out of network resources, or sporadic failures due to a bunch of TCP connections remaining in the TIME_WAIT status. These issues indicate that you're not using the database/sql package as intended.

Now, let's talk about fetching data from the database. Imagine you have a users table, and you want to query it for a user with the ID 1 and print their ID and name. Here's an example of how to do it using rows.Scan():

var (
	id   int
	name string
)

rows, err := db.Query("SELECT id, name FROM users WHERE id = ?", 1)
if err != nil {
	log.Fatal(err)
}
defer rows.Close()

for rows.Next() {
	err := rows.Scan(&id, &name)
	if err != nil {
		log.Fatal(err)
	}
	log.Println(id, name)
}

err = rows.Err()
if err != nil {
	log.Fatal(err)
}

Now, a couple of things can go wrong here, so pay attention. Always check for errors at the end of the for rows.Next() loop. If there's an error during the loop, you need to be aware of it. Don't assume that the loop iterates until you've processed all the rows.

Another important point is that as long as you have an open result set (represented by rows), the underlying connection is busy and can't be used for any other query. It's not available in the connection pool. So, if you iterate over all the rows using rows.Next(), the last call to rows.Next() will encounter an internal EOF error and automatically call rows.Close()for you. However, if you exit the loop prematurely (maybe due to an early return), therows won't get closed, and the connection will remain open. This can lead to resource exhaustion.

To prevent these issues, it's crucial to always call rows.Close() to release the resources associated with the result set. It's safe to call rows.Close() multiple times, so you can include it in a deferred statement at the end of the function. However, it's also a good practice to check for errors before calling rows.Close() to avoid runtime panics.

Additionally, it's essential to use defer rows.Close() even if you explicitly call rows.Close() at the end of the loop. This ensures that the result set is closed regardless of how you exit the loop.

Here's an important tip: avoid using defer within a loop. Since a deferred statement executes when the function exits, using defer in a long-running function can lead to a memory buildup. If you're repeatedly querying and consuming result sets within a loop, it's best to explicitly call rows.Close() when you're done with each result and avoid relying on defer.

By following these practices, you can ensure proper management of connections, avoid resource problems, and use the database/sql package effectively in your Go applications.

Single-Row Queries

When you're expecting a query to return only a single row, there's a handy shortcut to avoid some of the lengthy code. Instead of using the db.Query() method and iterating over the result set, you can use db.QueryRow() to directly retrieve the values into variables.

Here's an example:

var name string
err = db.QueryRow("select name from users where id = ?", 1).Scan(&name)
if err != nil {
    log.Fatal(err)
}
fmt.Println(name)

In this code, the QueryRow() method executes the query and retrieves a single row. The Scan() method then scans the result into the provided variables. Any errors that occurred during the query are deferred until Scan() is called. If there is an error, you can handle it accordingly.

You can also use QueryRow() with prepared statements for better performance and reusability. Here's an example:

stmt, err := db.Prepare("select name from users where id = ?")
if err != nil {
    log.Fatal(err)
}
defer stmt.Close()

var name string
err = stmt.QueryRow(1).Scan(&name)
if err != nil {
    log.Fatal(err)
}
fmt.Println(name)

In this case, we first prepare the statement using db.Prepare(), which compiles the SQL query into a prepared statement. Then, we can call QueryRow() on the prepared statement to execute the query and retrieve the result. The rest of the code remains the same.

Using these shortcuts can make your code more concise and readable when dealing with single-row queries. It simplifies the process of retrieving a single value without the need for manual iteration over the result set.

EXEC vs QUERY

Let's say you want to execute a statement in your database without caring about the result, but just want to check if any errors occurred. You might think that using either db.Exec() or db.Query() would achieve the same result. However, that's not the case, and it's important to understand the difference between them.

Take a look at these two examples:

_, err := db.Exec("DELETE FROM users") // OK
_, err := db.Query("DELETE FROM users") // BAD

Although they may seem similar, using db.Query() in this way is considered a mistake, and you should never do it.

Here's why: When you use db.Query(), it returns a sql.Rows object, which represents the result set of your query. The connection to the database remains reserved until you explicitly close the sql.Rows. This is because there might be additional data rows that you haven't read yet, so the connection cannot be released.

In the example above, using db.Query() to execute a delete statement means the connection will never be released. This can lead to resource exhaustion, such as having too many open connections, which can cause performance issues or even application failure.

On the other hand, using db.Exec() is the appropriate way to execute statements that don't return any rows. It doesn't return a result set like db.Query(), so there's no need to handle or close any rows. It simply executes the statement and returns any error that occurred during the execution. This ensures that the connection is released and can be reused.

By using db.Exec() correctly, you avoid tying up database connections unnecessarily and prevent resource leaks. This is crucial for maintaining the performance and stability of your application.

So, to summarize, always use db.Exec() when you want to execute a statement without expecting any rows in return. Avoid using db.Query() for this purpose, as it can lead to resource exhaustion and other potential issues.

Working with Transactions

In Go, transactions play an important role in ensuring consistent and reliable interactions with a database. Think of a transaction as an object that reserves a connection to the datastore, allowing you to perform a series of operations on the same connection.

To begin a transaction, you simply call db.Begin(), which returns a Tx variable representing the transaction. This transaction can then be closed using either the Commit() or Rollback() method. Under the hood, the Tx object obtains a connection from the connection pool and reserves it exclusively for that transaction. The methods available on the Tx object mirror the ones you can use directly on the database, such as Query() and others.

It's important to note that prepared statements created within a transaction are bound exclusively to that transaction. This ensures that the statements are executed within the transaction's context. For more details on prepared statements, you can refer to the relevant documentation.

Here's a crucial point to remember: do not mix transaction-related functions like Begin() and Commit() with SQL statements like BEGIN and COMMIT in your SQL code. Doing so can lead to undesirable consequences:

  • The Tx objects might remain open, holding onto a connection from the pool without releasing it back.
  • The state of the database could become inconsistent with the state of the Go variables representing it.
  • You might mistakenly believe you're executing queries on a single connection within a transaction, when in reality, Go silently creates multiple connections, and some statements end up outside the transaction.
  • When working within a transaction, it's crucial to avoid making calls directly to the db variable. Instead, make all your calls to the Tx variable created using db.Begin(). Only the Tx object is within the transaction, not the db itself. If you make further calls to db.Exec() or similar methods, they will happen outside the scope of your transaction on separate connections.

Sometimes, you might need to perform multiple statements that modify the connection state, even if you don't necessarily require a transaction. In such cases, you still need to bind your activity to a single connection, and the only way to achieve that in Go is by using a Tx.

For example, if you need to create temporary tables that are only visible to a single connection, set variables like using MySQL's SET @var := somevalue syntax, or change connection options such as character sets or timeouts, you must use a Tx to ensure that these operations are carried out on the same connection consistently.

By following these guidelines, you can effectively work with transactions in Go, ensuring proper connection management and maintaining the integrity of your database operations.

Prepared Statements

When you prepare a statement in Go, it's associated with a specific connection in the connection pool. The Stmt object keeps track of which connection was used for its preparation.

Now, when you actually execute that prepared statement using Stmt.Exec() or Stmt.Query(), it tries to utilize the same connection it was prepared on. However, if that connection is currently unavailable because it's either closed or occupied with another task, the statement will be executed on a different connection. In this case, the statement will be re-prepared with the database on the newly obtained connection.

This behavior is important to ensure that statements can still be executed even when the original connection is busy. However, it also means that in scenarios with high concurrency where many connections are occupied, a significant number of prepared statements can be created. This can lead to a few potential issues:

  • Apparent leaks of statements: Due to the re-preparation of statements on different connections, it may seem like statements are leaking or not being properly released.
  • Excessive re-preparation: Statements might end up being prepared and re-prepared more frequently than you expect. This can impact performance and efficiency.
  • Server-side limits on statements: Some database servers impose limits on the maximum number of statements that can be created. With high concurrency and frequent re-preparation, you could potentially hit these limits, causing errors or degraded performance.

It's essential to be aware of these possibilities when working with prepared statements in Go, especially in highly concurrent scenarios. Properly managing your connections, monitoring statement usage, and ensuring efficient reuse of prepared statements can help mitigate these issues and ensure smooth and optimal database interactions.

Prepared Statements in Transactions

Prepared statements created within a transaction are exclusively bound to that transaction. This means that any actions you perform on a Tx object directly correspond to the one and only connection underlying it.

It's crucial to note that prepared statements created inside a transaction cannot be used separately from that transaction. Similarly, prepared statements created on a DB object cannot be used within a transaction because they will be associated with a different connection.

If you have a prepared statement that was created outside of a transaction (DB), but you want to use it within a transaction, you can utilize Tx.Stmt(). This method allows you to create a new transaction-specific statement from the existing prepared statement. However, it comes with some drawbacks. It sets the connection to that of the transaction and re-prepares the statements every time they are executed. This behavior is not ideal and there are plans to improve it in the future. Therefore, it is recommended to avoid using Tx.Stmt() if possible.

It's crucial to exercise caution when working with prepared statements in transactions. Let's take a look at an example to illustrate this:

tx, err := db.Begin()
if err != nil {
	log.Fatal(err)
}
defer tx.Rollback()
stmt, err := tx.Prepare("INSERT INTO foo VALUES (?)")
if err != nil {
	log.Fatal(err)
}
defer stmt.Close() // Danger zone!
for i := 0; i < 10; i++ {
	_, err = stmt.Exec(i)
	if err != nil {
		log.Fatal(err)
	}
}
err = tx.Commit()
if err != nil {
	log.Fatal(err)
}
// stmt.Close() runs here!

In this example, we start a transaction (tx) and prepare a statement for inserting values into a table. However, pay close attention to the defer stmt.Close() line. Since it's deferred, it will be executed at the end of the enclosing function, after the transaction has been committed or rolled back. This means that the statement will be closed after the transaction is completed, which can lead to unexpected behavior and errors.

To avoid this issue, it's best practice to close the statement immediately after you no longer need it, within the scope of the transaction. In this case, you should remove the defer stmt.Close() line and instead close the statement right after the loop where you execute it.

By being mindful of these considerations and taking proper precautions, you can effectively work with prepared statements within transactions in Go without encountering any unexpected issues.

  • Database Connection Pool

When working with database connections in Go, there are several methods you can use to optimize performance and manage resources effectively. Let's dive into each method and its implications:

SetMaxOpenConns: This method allows you to set a limit on the maximum number of open connections in the connection pool. By default, there is no limit. Setting a reasonable value for MaxOpenConns ensures that your application doesn't overwhelm the database with an excessive number of concurrent queries. However, keep in mind that if the limit is reached and all connections are in use, further tasks will have to wait for a free connection. To prevent indefinite waiting, it's essential to set a timeout using a context.Context object.

SetMaxIdleConns: This method sets the maximum number of idle connections that can be kept in the pool. Idle connections are connections that are not currently in use. Allowing a higher number of idle connections can improve performance by reducing the need to establish new connections from scratch. However, keeping idle connections alive comes at a cost in terms of memory usage. Additionally, if a connection remains idle for too long, it may become unusable. It's important to strike a balance and only keep connections idle if they are likely to be used again soon. Note that the MaxIdleConns limit should always be less than or equal to MaxOpenConns, as enforced by Go.

SetConnMaxLifetime: This method sets the maximum lifetime of a connection before it is reused. By default, there is no maximum lifetime, and connections can be reused indefinitely. However, in certain situations, it can be beneficial to enforce a shorter lifetime. For example, if your database has a maximum connection lifetime enforced, you should set ConnMaxLifetime accordingly. Additionally, it can help facilitate swapping databases behind a load balancer. Keep in mind the frequency at which connections will expire and be recreated to avoid excessive connection churn.

SetConnMaxIdleTime: This method sets the maximum idle time for a connection before it is marked as expired. By default, there is no limit. Setting a reasonable ConnMaxIdleTime allows you to periodically remove idle connections that are not being used, freeing up resources in the process. This is particularly useful when you have a relatively high limit on idle connections but want to ensure efficient resource utilization.

summary:

  • Set a reasonable value for MaxOpenConns, ensuring it is below any hard limits imposed by your database and infrastructure.
  • Higher values for MaxOpenConns and MaxIdleConns generally lead to better performance, but be cautious about having too many idle connections that are not frequently reused, as it can impact performance and resource consumption.
  • Set a ConnMaxIdleTime value to remove idle connections that haven't been used for a long time, balancing resource utilization and availability.
  • Leave ConnMaxLifetime as unlimited unless your database has a specific limit or you require it for tasks like gracefully swapping databases.

By fine-tuning these settings based on your specific use case, you can ensure optimal performance and efficient resource management when working with database connections in Go.

  • Database connection timeouts

Understanding how the context works in transactions is crucial to ensure proper behavior and avoid errors. Let's break it down:

When you provide a context to the BeginTx() function in Go, that context applies to the entire transaction. If the context times out or gets canceled, all the queries within the transaction will automatically be rolled back. This is great because it helps maintain data consistency.

You have a couple of options for handling timeouts or cancellations. One approach is to use the same context for all the queries in the transaction. This ensures that all the queries as a whole complete before any timeout or cancellation takes effect. It's a straightforward way to manage the transaction's lifespan.

Alternatively, if you want different timeouts for individual queries within the transaction, you can create child contexts with specific timeouts for each query. The important thing to remember here is that these child contexts must be derived from the context you initially passed to BeginTx(). This ensures that if the BeginTx() context times out or gets canceled, the automatic rollback occurs, and your code won't mistakenly try to execute queries with a still-active context.

If you don't follow this rule and attempt to execute a query with a context that's still alive while the BeginTx() context has already timed out or been canceled, you'll encounter an error message saying "sql: transaction has already been committed or rolled back." This error indicates that the transaction has already been finalized and you can no longer perform any further operations on it.

Make sure you handle the context properly in transactions. Use the same context for all queries if you want them to complete together, or create child contexts derived from the BeginTx() context if you need per-query timeouts. By following these guidelines, you'll avoid errors and ensure the desired behavior in your Go transactions.

DB Migration The command migrate create -seq -ext=.sql -dir=./migrations create_movies_table creates a pair of migration files named 000001_create_movies_table.up.sql and 000001_create_movies_table.down.sql. The -seq flag ensures sequential numbering for the migration files. We specify the .sql extension using the -ext flag. The -dir flag indicates that the migration files should be stored in the ./migrations directory, which will be created if it doesn't exist. The name create_movies_table is a descriptive label for the migration files.

When you check the ./migrations directory, you will find the newly created up and down migration files. The up file should contain the SQL statement for creating the movies table, including its columns and constraints. The down file should contain the SQL statement to drop the movies table.

For example, the up file 000001_create_movies_table.up.sql would look like this:

CREATE TABLE IF NOT EXISTS movies (
  id bigserial PRIMARY KEY,
  created_at timestamp(0) with time zone NOT NULL DEFAULT NOW(),
  title text NOT NULL,
  year integer NOT NULL,
  runtime integer NOT NULL,
  genres text[] NOT NULL,
  version integer NOT NULL DEFAULT 1
);

And the down file 000001_create_movies_table.down.sql would contain:

DROP TABLE IF EXISTS movies;

To add additional migrations, you can use the migrate create command again. For example, running migrate create -seq -ext=.sql -dir=./migrations add_movies_check_constraints would generate the migration files 000002_add_movies_check_constraints.up.sql and 000002_add_movies_check_constraints.down.sql. These files should include SQL statements for adding and dropping the necessary CHECK constraints for the movies table.

After creating the migration files, you can execute the migrations using the migrate command with the -path and -database flags. For example, running migrate -path=./migrations -database=$GREENLIGHT_DB_DSN up would execute the migrations and display the applied migrations with their execution time.

Additional information is provided regarding migrating to a specific version, executing down migrations to roll back changes, and using remote migration files from sources like Amazon S3 or GitHub repositories.

The migrate tool simplifies managing database schema changes by generating migration files that can be executed to modify the database structure. Each migration file contains SQL statements for updating or reverting schema changes. By using the appropriate migrate commands, you can control the execution and rollback of these migrations, ensuring proper management of your database schema.

Migrating to a specific version

If you're curious about the current migration version of your database instead of checking the schema_migrations table, you can use the migrate version command. Just run something like this:

$ migrate -path=./migrations -database=$EXAMPLE_DSN version

It will display the migration version number, which tells you where your database stands in the migration process. This can be useful for tracking the progress of your migrations.

If you want to migrate up or down to a specific version directly, you can use the goto command. For example, to migrate to version 1, you can do:

$ migrate -path=./migrations -database=$EXAMPLE_DSN goto 1

This allows you to jump directly to a particular migration version without executing all the intermediate migrations.

Executing down migrations

Sometimes you might need to roll back certain migrations. To do this, you can use the down command, which lets you roll back by a specific number of migrations. For instance, if you want to roll back the most recent migration, simply run:

$ migrate -path=./migrations -database=$EXAMPLE_DSN down 1

It will undo the changes made by the most recent migration, effectively rolling back your database schema.

Remote migration files

The migrate tool also supports reading migration files from remote sources like Amazon S3 or GitHub repositories. This can be handy when you want to store your migration files in a remote location instead of locally.

For example, to fetch migration files from an S3 bucket, you can use a command like this:

$ migrate -source="s3://<bucket>/<path>" -database=$EXAMPLE_DSN up

Similarly, if you want to use migration files stored in a GitHub repository, you can specify the repository details:

$ migrate -source="github://owner/repo/path#ref" -database=$EXAMPLE_DSN up

You can even provide authentication by including your personal access token if needed:

$ migrate -source="github://user:personal-access-token@owner/repo/path#ref" -database=$EXAMPLE_DSN up

This feature enables you to manage your migration files remotely, making it convenient for collaborative or distributed development scenarios.

In summary, these additional features of the migrate tool allow you to migrate to specific versions, rollback migrations, and even fetch migration files from remote sources. They provide flexibility and control in managing your database schema changes.