Skip to content

Menu

  • Home
  • About
  • Resume
  • Contact

Archives

  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • May 2023
  • April 2023
  • March 2023
  • September 2022
  • April 2022
  • March 2022

Calendar

May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Apr   Jun »

Categories

  • Analysis
  • Personal View
  • Ping Pong Tracker – Embedded Camera System
  • Projects
  • Service Metrics
  • Tutorial
  • Web Scrapers

Copyright Andrew Serra 2025 | Theme by ThemeinProgress | Proudly powered by WordPress

Andrew Serra
  • Home
  • About
  • Resume
  • Contact
Written by Andrew SerraMay 16, 2024

Word Counter Using Producer-Consumer Pattern

Projects . Tutorial Article

Parallel computing is significant for performance and efficient use of resources. One pattern to use is the Producer-Consumer Pattern. Simply, it means some threads are information creators called the producers, and some are processors, called consumers. This will allow faster processing to reach the desired goal.

In this post, I will be explaining the Producer-Consumer Pattern and then implement a simple program that counts the number of words in a file using Go.

What is a Producer-Consumer Pattern?

Read moreGuide to Develop A Simple Web Scraper Using Scrapy

This pattern is responsible for efficiently processing data. In a way, the goal is to create a pipeline where different threads take on different roles. A real-life example can be from restaurants. In the back, people cook the various ingredients, and then the front of the kitchen will bring all the ingredients into one presentable plate. Here the cooks are the producers where the process is to take a specific ingredient and cook it, and the consumers are the presenters where ingredients are assembled.

This is quite useful if continuous data is streamed or large enough making computation time longer when run sequentially. Counting words in a file is a great example. If the file is large enough we will have two different processes spawned to decrease the runtime. The file can be broken down into chunks of bytes and passed to a producer thread which counts the number of words. Then this is given to a consumer which tallies up all the number of words found in chunks.

Producer-Consumer Pattern Visually

How is Producer-Consumer different than MapReduce?

Read moreService Metrics Project, A Step-by-Step Journey to Analytics Development

MapReduce was developed at Google to handle large amounts of processing and storage. They have implemented and also tested various conditions in their systems. Here is a link to their paper.

It is a superset of this pattern. The difference lies in how the processing occurs. MapReduce has different roles that help improve processing time and store them. It assures reliability in the system. If any node is done it can recover, and redistribute the load. This allows adapting to network load conditions, worker availability, and more possible hurdles.

Read moreDesigning the data flow of a Ping Pong Tracker on a Embedded Device

MapReduce utilizes concurrency on a single device and across multiple devices on a network. There is a master node that manages the orchestration of devices. These will

Word Counter Application in Go

My goal with this small project is to demonstrate how to implement a Producer-Consumer Pattern for word counting in Go. The producers will be counting the number of words in a particular chunk and there will be a single consumer which counts the outputted sum into a single value.

What are Go Routines?

Read moreSnickerdoodle FPGA Project: Image Processing and Position Visualization using Glasgow Linux

In Go, there is a concept called Go Routines which I’ll be utilizing. This is the same as running a thread. The advantage of go routines compared to threading processes in other languages is that they are “virtual threads”. They run in the Go runtime meaning they are not OS threads. This allows Go to be able to detect deadlocks. Another advantage is that they are easy to use and they are lightweight.

Producer-Consumer Pattern Code

First, we define a main function responsible for receiving the file that the words will be counted from, and print the result count. Very simply here is the code:

func main() {
	var filename string

	flag.StringVar(&filename, "filename", "", "Filename to process")

	flag.Parse()

	if filename == "" {
		fmt.Println("invalid file name")
		os.Exit(1)
	}

	wc, err := countWords(filename)

	if err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	fmt.Printf("Words counted: %d\n", wc)
}

File Reading

Read moreNon-hassle, Easy, And Simple Ping Tool In Go

The part we will be looking at here is the countWords function. Our goal is to check if the file exists by trying to open it and read it in chunks of 512 bytes. This is configurable and a sweet spot can be found by trial and error.

func countWords(filename string) (int, error) {
	var count int = 0
	fileIn, err := os.Open(filename)

	if err != nil {
		return 0, err
	}

	defer fileIn.Close()

	reader := bufio.NewReader(fileIn)
	chunkSize := 512

Producer Spawning

processChan := make(chan int)
var wg sync.WaitGroup

for {
	buf := make([]byte, chunkSize)
	numBytes, err := reader.Read(buf)

	if err != nil {
		fmt.Printf("error reading file: %s\n", err)
		return 0, err
	}
	wg.Add(1)
	go mapper(buf, numBytes, processChan, &wg)

	if _, err := reader.Peek(1); err == io.EOF {
		break
	}
}

The next step is to create a channel and go routines to count the words in a given chunk. This will be our producers or in MapReduce, this will be the mappers. To spawn these go routines we run a loop until the file is completely read and producers are spawned.

Read moreSimplified Credit Card Validator API in Go

The wait groups are defined to synchronize all the go routines. Once synchronized the channel has to be closed so that the go routines are not stagnant. This will create a deadlock error. To prevent this, we create a separate go routine in the background to wait for all the producers to end.

go func() {
	wg.Wait()
	close(processChan)
}()

Gathering All Pieces of Code

Finally, run a consumer, or reducer, to count all the sums from each chunk of data processed then return the total. There is no need for mutexes to manage resource sharing since there is a single consumer. Below are the definitions of the producer and consumers.

func mapper(data []byte, n int, c chan int, wg *sync.WaitGroup) {
	defer wg.Done()
	chunk := string(data[:n])
	c <- len(strings.Fields(chunk))
}

func reducer(count *int, c chan int) {
	for num := range c {
		*count += num
	}
}

Read moreDocker Containers for React Deployment

The full process is as follows:

  1. Open File
  2. Read in chunks of the file
  3. Spawn a producer Go Routine with the data chunk
  4. Repeat steps 2 and 3 if the file is not fully read
  5. Create a Go Routine to wait for all producers to be done and close the channel
  6. Sum up all word counts in the channel
  7. Print the sum

The full code can be found on my Github. Check out my Credit Card Validator API post also written in Go.

Read moreHow To Minimize Docker Container Sizes

Thanks for reading this post!

If you’d like to contact me about anything, send feedback, or want to chat feel free to:

Send an email: andy@serra.us

Contact Me

  • LinkedIn
  • GitHub
  • Twitter
  • Instagram
Tags: go, go routines, parallel computing, producer-consumer pattern, threading

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Archives

  • June 2024
  • May 2024
  • April 2024
  • March 2024
  • May 2023
  • April 2023
  • March 2023
  • September 2022
  • April 2022
  • March 2022

Calendar

May 2024
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Apr   Jun »

Categories

  • Analysis
  • Personal View
  • Ping Pong Tracker – Embedded Camera System
  • Projects
  • Service Metrics
  • Tutorial
  • Web Scrapers

Copyright Andrew Serra 2025 | Theme by ThemeinProgress | Proudly powered by WordPress