A CSV or Comma Separated Values file data

Export CSV with Node.js

· 8 min read

Introduction

CSV or Comma Separated Values is a widely used format representing structured data as plain text. Each line in a CSV file typically represents a row of data, while individual data points separated by commas correspond to columns. Due to its simplicity and readability, CSV has become a standard file format for data exchange between applications, making it a favourite among data analysts, developers, and database administrators.

This article aims to help you export a sample dataset to a CSV file. Node.js will be used to achieve the goal. Node.js, an open-source JavaScript runtime, has emerged as a powerful platform for building scalable network applications. There are a few significant reasons to choose Node.js as the preferred language here:

  • Performance: Node.js is non-blocking and event-driven, which means it can handle many operations simultaneously without waiting for one to complete. This makes it particularly suited for IO-bound tasks like file operations.
  • NPM Ecosystem: Node.js consists of an extensive package ecosystem with libraries like papaparse that make working with CSV files straightforward and efficient.
  • Flexibility: Since Node.js is JavaScript-based, it easily integrates with various databases and backend systems, making the process of fetching and exporting data seamless.
  • Scalability: Whether you're exporting a few rows or several million, Node.js can be scaled to meet your requirements.

This article will delve deeper into how you can efficiently leverage Node.js to export data as CSV files. But before that, let’s understand what the CSV format is.

Understanding the CSV Format

A Comma-Separated Values (CSV) file is a plain text format that organizes data into individual rows, with each row representing a record. Within these rows, fields or data points are separated by commas (,), hence the name. However, it is also common to see other delimiters like semicolons(;), depending on the settings or the use case. Let’s take a look at an example to understand the format better.

FirstName,LastName,Email,Position,DateJoined
John,Doe,johndoe@email.com,Manager,2020-01-15
Jane,Smith,janesmith@email.com,Developer,2019-05-23
Alan,Turing,alanturing@email.com,Data Scientist,2018-03-10
Ada,Lovelace,adalovelace@email.com,Software Engineer,2022-07-01
Ellen,Johnson,ellenjohnson@email.com,HR Specialist,2021-11-05

The above snippet consists of an imaginary dataset of employee details. CSV files are typically saved with a .csv extension. Ideally, a CSV file consists of two parts:

  • Header Row: Describing the content of each column (FirstName, LastName, Email, Position, DateJoined).
  • Data Rows: Containing the actual data for each employee. Each row corresponds to a single employee's details.

When you view this in software that can interpret CSV, like Microsoft Excel or Google Sheets, you'll see the data neatly organized into rows and columns based on the structure defined by the CSV format.

Image

The need for CSV export is wide. The CSV format provides multiple advantages, like:

  • Universality: Almost every data processing tool or software, from Excel to major databases, can read and understand CSV files.
  • Readability: Humans can easily open and read CSV files using a simple text editor. This makes it convenient for quick data checks or edits.
  • Transfer Efficiency: Smaller in size than many other formats, making it efficient for data transfer.
  • Flexibility: Allows for a wide range of data, from simple lists to complex data sets with multiple columns.

Now that you have a basic understanding of what CSV is let’s dive into the code.

Exporting CSV in Node.js

In this article, as mentioned earlier, you’ll be using Node.js to export CSV files in Node.js. But before jumping into the code, please make sure that you have a basic understanding of Node.js and NPM. You can move to the next section if you are familiar with the two.

If you want to check the code directly, visit this repo.

Setting the Node.js Environment

Before writing your code, let’s first set up the Node.js environment. Create a new directory for the project and open the terminal there. Run the following command to initialize the folder with NPM.

npm init -y

This command will generate a package.json file with default settings. The -y flag means you're accepting the defaults. You can omit -y if you wish to provide specific answers for the initialization.

Once the command executes successfully, you are ready to go. Open this folder with your favourite code editor to start writing your code.

Two methods will be discussed in this article. The first one is using built-in methods, and the second one is using a third-party package.

Using Built-in Methods

First, try exporting a JSON file to a CSV file using the built-in Node.js modules. Even though it is possible, it’s not always an ideal solution. The reasons will be discussed later. Let’s first write some code.

Create a new file called data.json inside your working directory and paste the code from this file. It contains the dummy JSON file that’ll be exported.

Once done, create another file called builtIn.js inside the root directory and paste the following code:

const fs = require("fs").promises;

async function convertToCSV() {
  try {
    const rawJsonData = await fs.readFile("data.json", "utf8");
    const jsonData = JSON.parse(rawJsonData);
    const { users } = jsonData;

    const replacer = (_, value) => (value === null ? "" : value);
    const header = Object.keys(users[0]);
    const csv = [
      header.join(","),
      ...users.map((row) =>
        Object.values(row)
          .map((value) => JSON.stringify(value, replacer))
          .join(",")
      ),
    ].join("\r\n");

    await fs.writeFile("users.csv", csv);
    console.log("CSV file successfully created!");
  } catch (error) {
    console.error(error);
  }
}

convertToCSV();

Let’s break down the code and understand it in a step-by-step manner.

  • The first step is to import the built-in fs module. Specifically, the asynchronous version of this module (fs.promises) is imported to allow promise-based file operations. The fs or File System module handles the read, write, delete, and other such operations.
  • After the import, a new function called convertToCSV is defined. It is defined as an asynchronous function.
  • The function then reads the content of the data.json file and retrieves this content as a UTF-8 encoded string. This string is then parsed into a JSON object. The users array is extracted from the parsed JSON object.
  • A function named replacer is defined to help in the stringification process of the JSON data. If a value is null, this function will replace it with an empty string.
  • After that, the first user object's property names (or keys) are retrieved. These keys will be used as the CSV data's header or column names.
  • The CSV data is constructed in two parts:
    • First, the header is converted into a comma-separated string.
    • Second, each user in the users array is processed:
      • For each user, the values of its properties are extracted.
      • These values are then stringified, taking care to handle special characters or null values.
      • The stringified values are joined with commas to form a row for that user in the CSV data.
    • All rows, including the header, are then joined together with carriage return and newline characters to create the final CSV format.
  • The constructed CSV string is written to a new file named users.csv.

The function is now ready to execute. If you run the script with the following command, you’ll see a new file called users.csv is created in the root directory:

node builtIn.js

The output of the code will look like this:

id,name,email,age,gender,city
1,John Doe,johndoe@example.com,25,Male,New York
2,Jane Doe,janedoe@example.com,30,Female,Los Angeles
3,Bob Smith,bobsmith@example.com,40,Male,Chicago
4,Alice Johnson,alicejohnson@example.com,28,Female,Houston
5,David Lee,davidlee@example.com,35,Male,San Francisco
6,Emily Chen,emilychen@example.com,27,Female,Seattle
7,Michael Brown,michaelbrown@example.com,45,Male,Miami
8,Sarah Davis,sarahdavis@example.com,32,Female,Boston
9,William Wilson,williamwilson@example.com,29,Male,Atlanta
10,Olivia Taylor,oliviataylor@example.com,31,Female,Denver
11,James Lee,jameslee@example.com,26,Male,San Diego
12,Sophia Kim,sophiakim@example.com,33,Female,Portland
13,Christopher Clark,christopherclark@example.com,39,Male,Dallas
14,Ava Hernandez,avahernandez@example.com,24,Female,Phoenix
15,Daniel Baker,danielbaker@example.com,36,Male,Las Vegas
16,Mia Perez,miaperez@example.com,28,Female,Austin
17,Matthew Turner,matthewturner@example.com,42,Male,Philadelphia
18,Isabella Collins,isabellacollins@example.com,29,Female,Washington D.C.
19,Andrew Rodriguez,andrewrodriguez@example.com,34,Male,San Antonio
20,Chloe Martinez,chloemartinez@example.com,27,Female,New Orleans

Using the built-in methods for CSV export has its advantages and disadvantages. Let’s discuss the benefits first:

  1. You're not relying on third-party libraries or packages. This means fewer security vulnerabilities and less overhead when updating or maintaining your project.
  2. You have complete control over the conversion process. This allows you to customize or optimize the method according to your specific needs.
  3. Since there are no additional layers from third-party libraries, the method might offer better performance in certain scenarios.

But the disadvantages might surpass the benefits of the method:

  • If your data contains special characters, such as commas or newlines within the data itself, you might have to add additional logic to handle these scenarios. Many libraries have built-in solutions for these cases.
  • Third-party CSV libraries might offer more features, such as stream-based processing for very large files, advanced error handling, or support for different delimiters.
  • Writing and testing your own implementation can be more time-consuming than using a tried-and-tested library.
  • Manual implementations can be more susceptible to errors, especially if you're not familiar with all the nuances of the CSV format or if you overlook some edge cases.
  • As data requirements evolve, you need to update your conversion code. With third-party libraries, these updates might come as part of library updates.

While using built-in Node.js methods to export to CSV can be straightforward and effective for simple cases, more complex scenarios might benefit from the use of specialized libraries.

Now, let’s take a look at CSV export using a third-party library called Papaparse.

Using Third-party Library

Exporting CSV using the built-in methods can be useful when the dataset is small and contains no special characters that can cause issues. But in real-world scenarios, using a tried and tested package that does the job well is better. Papaparse is such a package. It is the fastest CSV parser for the browser. It is easy to use and allows features like auto delimiter detection, pause, resume, abort, optional jQuery integration, and more. Even though it is mainly for in-browser parsing, it can be used in conjunction with the fs module to work on the server side. Using this package, let’s take an example of exporting the same data.json file to a CSV file.

Create a new file called papaparse.js in your root directory, and paste the following code:

const fs = require("fs").promises;
const Papa = require("papaparse");

async function convertToCSV() {
  try {
    const rawJsonData = await fs.readFile("data.json", "utf8");
    const { users } = JSON.parse(rawJsonData);

    const csv = Papa.unparse(users);

    await fs.writeFile("users.csv", csv);
    console.log("CSV file successfully created!");
  } catch (error) {
    console.error(error);
  }
}

convertToCSV();

The above code will convert the data.json file to a users.csv file using the Papaparse package and write it to the file system using the fs module. Let’s break down the code:

  • The code begins by importing the fs module's promise-based functions from Node.js, which enable asynchronous file operations without callbacks. Following this, the papaparse library is imported.
  • An asynchronous function named convertToCSV is again defined in this file. This function is responsible for the entire process of converting the JSON data to a CSV format.
  • Inside this function, the content of data.json is read as a UTF-8 string via the readFile method available in the fs module.
  • Next, the string is parsed and ready to convert it into a JSON object. The users array is then destructured from the parsed JSON object, effectively extracting the array that will be converted to CSV.
  • With the extracted users array, Papaparse's unparse method is used to convert this array of objects into a CSV string. It's worth noting that Papaparse handles edge cases, special characters, and various nuances of CSV formatting out-of-the-box.
  • Having obtained the CSV-formatted string, this data is written to a file named users.csv. The writeFile method from the fs module is used for this purpose.
  • Once the file-writing is successful, a success message is logged into the console.
  • The entire process within the function is encapsulated within a try-catch block. This means that if any step fails or throws an error, the code won't crash. Instead, it will jump to the catch block and log the error with console.error(error);. This provides a clearer insight into any potential issues without halting the entire operation.
  • Finally, the convertToCSV function is invoked outside the function, setting the whole process into motion.

Now, if you run your code using the command node papaparse.js, you can see the newly exported CSV file in the root directory.

The advantages of using this method are:

  • It is a robust package for CSV parsing. Papaparse is known for its comprehensive handling of CSV parsing, managing edge cases, special characters, and intricate CSV-specific details.
  • Papaparse is a feature-rich package. It offers features such as auto delimiter detection, pausing and resuming, aborting, and optional jQuery integration. Built-in methods would require manual implementation for such functionalities.
  • While the focus here is server-side, it's worth mentioning that Papaparse is optimized for in-browser parsing, which can be beneficial if the same functionality needs to be replicated on the client side.
  • As an established library, Papaparse is actively maintained, which means frequent updates, bug fixes, and new feature releases.

At the same time, there are a few disadvantages:

  • Using Papaparse introduces an external dependency to your project. This can increase the complexity of managing packages and might expose security concerns if not updated regularly.
  • For basic CSV parsing and generation tasks, introducing Papaparse might be an over-complication. Built-in methods would suffice and be more efficient for more straightforward needs.
  • Even though Papaparse is optimized, there might be specific scenarios where custom-built methods might outperform it, especially when tailored for specialized needs.
  • When used on the client side, adding Papaparse could increase the size of your scripts, potentially affecting load times.

Even though Papaparse brings advanced parsing capabilities and a feature-rich environment, it's essential to determine whether its advantages outweigh the overhead of introducing a new library, especially for simpler tasks that built-in methods might handle efficiently.

If you want to check out the complete code for this article, it is available on this repo.

Conclusion

The article was aimed toward CSV export using Node.js. You learned about the CSV format and how it is useful. You also discovered two methods that can be used for CSV export. Both methods have their benefits and problems. But, it is important to understand your specific use case and, depending on it, choose a path.

Subha Chanda

About Subha Chanda

Logo
Copyright © 2024. All rights reserved.