Node.js Streams: A Practical Guide
Streams are a powerful way to maintain efficient end-to-end network communication in Node.js. The concept of streams isn't unique to Node.js—they were also used in Unix systems where programs could interact with each other using the pipe operator. Before diving into Node.js streams, let's first understand buffers.
Buffers: The Foundation
A buffer is a temporary place to store a chunk of data that is being transferred from one place to another. When the buffer gets filled, it passes small amounts of data at a time to its destination.
What Are Streams?
Streams are used to transfer data to buffers in very small chunks. As soon as the buffer gets filled, it starts providing data to the user. A real-world example of streams and buffers is YouTube videos—you can start watching content way before the entire video is fully downloaded. Streams are essential for enhancing performance in Node.js applications.
Why Streams Matter: A Practical Example
Imagine you want to upload a 10GB file to a server. Without understanding streams, the uploading process would work like this: first, the entire file is sent to the server, then it gets written to server storage. But here's the problem—if the server has only 2GB of RAM, and the data must be stored in main memory before being written to disk, the data chunk cannot exceed 2GB in a single upload.
The solution? As soon as the RAM gets filled, the data is written to disk and the main memory is cleared. Similarly, your computer can stream data as it arrives from external sources. Streams provide a low-level, performant way to access external data sources like sockets and files with large amounts of data. They're particularly helpful for performance because RAM isn't cluttered when dealing with large payloads.
Another powerful feature of streams is that you can connect them together—you can take one stream and pipe its output to another stream, creating powerful data processing pipelines.
Types of Node.js Streams
Node.js primarily provides four types of streams:
1. Read Stream
Read streams allow Node.js to read data from a stream.
import fs from 'fs'
const ReadStream = fs.createReadStream(__dirname + '/readMe.txt', 'utf8')
ReadStream.on('data', (chunk) => {
console.log('New chunk received')
console.log(chunk)
})
2. Write Stream
Write streams allow Node.js to write data to a stream.
import fs from 'fs'
const ReadStream = fs.createReadStream(__dirname + '/readMe.txt')
const WriteStream = fs.createWriteStream(__dirname + '/writeMe.txt')
ReadStream.on('data', (chunk) => {
console.log('New chunk received')
WriteStream.write(chunk)
})
Using Pipes for Cleaner Code
Instead of manually handling data events, you can use pipes to join two streams together:
import fs from 'fs'
const ReadStream = fs.createReadStream(__dirname + '/readMe.txt')
const WriteStream = fs.createWriteStream(__dirname + '/writeMe.txt')
ReadStream.pipe(WriteStream).on('error', console.error)
Streaming to HTTP Responses
You can also send data down the stream to the client by piping the response with a read stream:
import http from 'http'
import fs from 'fs'
const server = http.createServer((req, res) => {
res.writeHead(200, {'Content-Type': 'text/plain'})
const ReadStream = fs.createReadStream(__dirname + '/readMe.txt')
ReadStream.pipe(res)
})
server.listen(3000, '127.0.0.1')
3. Duplex Stream
Duplex streams allow Node.js to both read and write data to a stream. They represent the middle section of your pipeline, positioned between readable and writable streams.
Basic Duplex Example with Passthrough
import Passthrough from 'stream'
import { createReadStream, createWriteStream } from 'fs'
const tunnel = new Passthrough()
const ReadStream = createReadStream("input.txt")
const WriteStream = createWriteStream("output.txt")
let total = 0
tunnel.on("data", (chunk) => {
total += chunk.length
console.log(chunk)
})
ReadStream.pipe(tunnel).pipe(WriteStream)
Here, Passthrough is used to simply pass the input bytes across to the output.
Custom Duplex: Throttle Stream
There's another type of duplex stream called throttle. In this example, we throttle the data for 10 milliseconds—the throttle will push each chunk with a delay of 10ms:
import { Duplex, Passthrough } from 'stream'
import { createReadStream, createWriteStream } from 'fs'
const ReadStream = createReadStream("input.txt")
const WriteStream = createWriteStream("output.txt")
class Throttle extends Duplex {
constructor(ms) {
super()
this.delay = ms
}
_read() {}
_write(chunk, encoding, callback) {
this.push(chunk)
setTimeout(callback, this.delay)
}
_final() {
this.push(null)
}
}
const tunnel = new Passthrough()
const throttle = new Throttle(10)
let total = 0
tunnel.on("data", (chunk) => {
total += chunk.length
console.log(total)
})
ReadStream.pipe(throttle).pipe(tunnel).pipe(WriteStream)
To create this custom duplex stream, we define a class Throttle extending Duplex from the stream module. Since duplex is a combination of read and write, we define two methods: one for writing (_write) and another for reading (_read).
4. Transform Stream
Transform streams allow you to filter and transform data as it passes through the stream.
import { Transform } from 'stream'
class ReplaceText extends Transform {
constructor(char) {
super()
this.replaceChar = char
}
_transform(chunk, encoding, callback) {
const transformChunk = chunk.toString().replace(/[a-z]|[A-Z]|[0-9]/g, this.replaceChar)
this.push(transformChunk)
callback()
}
_flush(callback) {
this.push('more stuff is being passed through...')
callback()
}
}
const xStream = new ReplaceText('x')
process.stdin.pipe(xStream).pipe(process.stdout)
Here, ReplaceText is used to transform characters in a string according to a given constraint. There are many transform streams available—Node.js comes with a zlib package that provides a transform stream for compressing incoming data from a read stream and sending the compressed data to a write stream.
Conclusion
Streams are a fundamental part of Node.js that enable efficient handling of large amounts of data. By understanding how to use read streams, write streams, duplex streams, and transform streams, you can build highly performant applications that handle data efficiently without overwhelming system memory.
Whether you're building file processing systems, HTTP servers, or data transformation pipelines, streams provide the tools you need to work with data in a memory-efficient and scalable way.
Want to learn more about Node.js performance? Check out the official Node.js streams documentation.