A primer into multithreaded programming

Mira Yurizaki · December 8, 2016

This is a primer on what I would consider (and probably others) is an advanced topic. Not because it's hard to wrap your head around, but because it's hard to do effectively. It's like the board game Go: the concepts are easy enough, but boy howdy do you need to really get your head around the game to play at a pro level. Before I go further though, I want to lay out some things that this primer is and is not.

This primer is to gloss over the fundamentals of multithreaded programming and the issues that happen
This will not teach you how to code multi-threaded programs. I will have examples, but that's about it.
This will not teach you how to take your single-threaded program and turn it into a multi-threaded one.

The basics: What are threads?

Spoiler

A program is divided into two major categories as far as scheduling and resource management is concerned: processes and threads. A process is the program itself. A thread is a task or subtask of the program. To provide a real-world analogy, you can think of a process like a full course meal. Giving this meal requires several tasks, such as preparing the ingredients, cooking them (if needed), laying them out on the plate/bowl/whatever, and serving it. A lot of these tasks can be independent. For example, if the main dish is steak, you can have someone making the steak while someone prepares and serves the appetizers.

To put this in an application perspective, let's take a web browser. The browser itself is the process. It can have threads to manage the network connection, rendering the web page, running JavaScript or other client code, etc. This allows an application to run certain parts of the application without it affecting something else. Without threads, a GUI based application will appear to have froze if it's performing an action that takes time, like a large computing task or waiting for something on the internet. Because the application is busy and the task is on the same thread, it won't be able to service anything related to the UI and so the UI freezes, giving the impression the application froze.

Threads have a life cycle, falling into one of these states:

Running: The thread is currently running on a processor. Can transition to Blocked/Sleeping, Ready, or Done.
Blocked/Sleeping: The thread is waiting on some I/O device to be ready or is waiting to receive a signal. Can transition to Ready.
Ready: The thread is ready to run and is waiting its turn. Can transition to Run or Blocked/Sleeping if for some reason it can't really run.
Done: The thread is done and its resources can be reaped.

Threads are the unit of scheduling in most modern OSes. That is, the OS does not schedule via a process then look at its threads. The reason being is that processes own a lot more resources than threads. Switching between threads is much quicker to do than switching processes and it's likely that when the OS scheduler looks at which threads are ready to run, they're coming from the same process.

There's another unit of scheduling called fibers. The main difference between fibers and threads is that threads can be preempted. Fibers must yield control back to the system.

How do you create threads?

Spoiler

The first is the programming language must support mulithreading. Most modern languages do. For example in Python, there's the thread or threading library. In POSIX C, there's the pthread library.

Since a thread is basically a task within the program, most threads are created by the library's thread creation method and feeding it a function with its arguments, if any. Here's an example of a simple multithreaded program in Python, where each thread counts down from 50, but they wait a random amount of time before doing the countdown.


import thread
import time
import random
import math

def random_wait(thread_name):
    wait_count = 50
    while wait_count > 0:
        time.sleep(random.random() * 3)
        wait_count -= 1
        print "%s wait count: %d" % (thread_name, wait_count)

try:
    thread.start_new_thread(random_wait, ("Thread-1",))
    thread.start_new_thread(random_wait, ("Thread-2",))
    thread.start_new_thread(random_wait, ("Thread-3",))
    thread.start_new_thread(random_wait, ("Thread-4",))
except:
    print "Error: unable to start thread"
    
while 1:
    pass

And the output looks like this:


Thread-4 wait count: 49
Thread-4 wait count: 48
Thread-2 wait count: 49
Thread-3 wait count: 49
Thread-4 wait count: 47
Thread-2 wait count: 48
Thread-4 wait count: 46
Thread-1 wait count: 49
Thread-1 wait count: 48
Thread-1 wait count: 47
Thread-1 wait count: 46
Thread-4 wait count: 45
Thread-2 wait count: 47
Thread-3 wait count: 48
Thread-4 wait count: 44

Let's spice things up a bit. The program will have a globally accessible variable, and each thread will add to it some value. In this case, threads 1 and 2 will add 1 to the variable, whereas threads 3 and 4 will subtract 1 from it.


import thread
import time
import random
import math

global test_var

def random_wait(thread_name, var_change):
    global test_var
    wait_count = 50
    while wait_count > 0:
        time.sleep(random.random())
        wait_count -= 1
        test_var += var_change
        print "%s wait count: %d" % (thread_name, wait_count), "Test var: %d" % (test_var)

try:
    global test_var
    test_var = 0
    thread.start_new_thread(random_wait, ("Thread-1", 1))
    thread.start_new_thread(random_wait, ("Thread-2", 1))
    thread.start_new_thread(random_wait, ("Thread-3", -1))
    thread.start_new_thread(random_wait, ("Thread-4", -1))
except:
    print "Error: unable to start thread"
    
while 1:
    pass

And it produces this output:


Thread-3 wait count: 49 Test var: -1
Thread-4 wait count: 49 Test var: -2
Thread-2 wait count: 49 Test var: -1
Thread-3 wait count: 48 Test var: -2
Thread-1 wait count: 49 Test var: -1
Thread-1 wait count: 48 Test var: 0
Thread-4 wait count: 48 Test var: -1
Thread-4 wait count: 47 Test var: -2
Thread-2 wait count: 48 Test var: -1
Thread-1 wait count: 47 Test var: 0
Thread-3 wait count: 47 Test var: -1
Thread-4 wait count: 46 Test var: -2
Thread-2 wait count: 47 Test var: -1
Thread-4 wait count: 45 Test var: -2
Thread-1 wait count: 46 Test var: -1
Thread-3 wait count: 46 Test var: -2
Thread-2 wait count: 46 Test var: -1
Thread-4 wait count: 44 Test var: -2
Thread-4 wait count: 43 Test var: -3
Thread-3 wait count: 45 Test var: -4
Thread-1 wait count: 45 Test var: -3
Thread-4 wait count: 42 Test var: -4
Thread-3 wait count: 44 Test var: -5
Thread-2 wait count: 45 Test var: -4
Thread-4 wait count: 41 Test var: -5
Thread-1 wait count: 44 Test var: -4
Thread-4 wait count: 40 Test var: -5
Thread-3 wait count: 43 Test var: -6
Thread-3 wait count: 42 Test var: -7
Thread-1 wait count: 43 Test var: -6
Thread-3 wait count: 41 Test var: -7
Thread-4 wait count: 39 Test var: -8
Thread-1 wait count: 42 Test var: -7
Thread-2 wait count: 44 Test var: -6
Thread-2 wait count: 43 Test var: -5
Thread-4 wait count: 38 Test var: -6
Thread-3 wait count: 40 Test var: -7
Thread-1 wait count: 41 Test var: -6
Thread-2 wait count: 42 Test var: -5
Thread-3 wait count: 39 Test var: -6

Notice something here, the global variable is pretty well into the negative, meaning threads 3 and 4 ran more often.

The catch though is the random wait. This is to simulate delays that can happen in a complex environment. Also for a simple task like this, removing the random wait lets all the tasks run basically at the same time because they're always ready.

But if creating multiple threads is that easy, why is multithreading "hard"?

Spoiler

Well hold on there coding cowboy! A thread may be calling a function as a separate unit of execution, but it can behave and act just like any other function in a program. Let's modify the code to show this:


import thread
import time
import random
import math


def random_wait(thread_name, parent_name):
    wait_count = 50
    while wait_count > 0:
        time.sleep(random.random())
        wait_count -= 1
        print "%s - %s wait count: %d" % (parent_name, thread_name, wait_count)

def main_thread(thread_name):
    count = int(math.ceil(random.random()*5))
    print "\nNumber of sub threads for %s: %d" % (thread_name, count)
    for x in xrange(0, count, 1):
        try:
            thread.start_new_thread(random_wait, ("Sub-thread-%d" % (x),thread_name,))
        except:
            print "Error: unable to start thread"
try:

    thread.start_new_thread(main_thread, ("Main-1",))
    thread.start_new_thread(main_thread, ("Main-2",))
    thread.start_new_thread(main_thread, ("Main-3",))
    thread.start_new_thread(main_thread, ("Main-4",))
except:
    print "Error: unable to start thread"
    
while 1:
    pass

The script now spawns threads using "main_thread", which creates anywhere from 1 to 5 sub threads, which then do the same thing as in the previous example. The output of this looks like:


Number of sub threads for Main-1: 2
Number of sub threads for Main-2: 1
Number of sub threads for Main-3: 5
Number of sub threads for Main-4: 5

Main-3 - Sub-thread-3 wait count: 49
Main-3 - Sub-thread-3 wait count: 48
Main-1 - Sub-thread-0 wait count: 49
Main-4 - Sub-thread-2 wait count: 49
Main-4 - Sub-thread-1 wait count: 49
Main-3 - Sub-thread-4 wait count: 49
Main-1 - Sub-thread-0 wait count: 48
Main-4 - Sub-thread-4 wait count: 49
Main-4 - Sub-thread-0 wait count: 49
Main-1 - Sub-thread-1 wait count: 49
Main-3 - Sub-thread-2 wait count: 49
Main-4 - Sub-thread-2 wait count: 48
Main-4 - Sub-thread-2 wait count: 47
Main-3 - Sub-thread-1 wait count: 49
Main-3 - Sub-thread-3 wait count: 47
Main-4 - Sub-thread-3 wait count: 49
Main-3 - Sub-thread-0 wait count: 49
Main-2 - Sub-thread-0 wait count: 49
Main-4 - Sub-thread-0 wait count: 48
Main-1 - Sub-thread-0 wait count: 47
Main-3 - Sub-thread-4 wait count: 48
Main-4 - Sub-thread-2 wait count: 46
Main-4 - Sub-thread-4 wait count: 48

In other words, you can have what look like complete programs running on a separate thread. And those threads can spawn other threads. You can see this can get complicated really fast.

What makes multithreading hard then?

Spoiler

Resource contention. You ever had to share something that was wanted by everyone? Well this is the problem with multithreading. And it's not just say multiple threads need to use a hard drive, but if they share a data resource. I'll start with an example and then go on to classes of issues multithreading has.

Example time! A case of the bank screwing you (okay, it's really bad programming screwing you)

Spoiler

Let's say you're at the ATM and you want to make a $200 withdraw. But at the same time, it's payday and the direct deposit system is kicking in to give you $1000. So you have a system that looks like this:


Direct deposit <-> Bank <-> ATM

Since neither the direct deposit system nor the ATM can directly access and modify the bank's data, they request a copy to modify locally. From the start, let's say you have $2000 in the bank. The ATM and direct deposit system both copy this and work with a balance of $2000. Direct deposit adds $1000, making its local copy $3000. The ATM subtracts $200, making its local copy $1800. Now you have a problem, you have two versions of the same data. If the direct deposit system saves the new balance first, then the ATM saves the new balance, your end result is $1800. If it's the other way around, your balance will end up being $3000. The correct balance is $2800.

Don't believe this is a thing? Well let's look at this example:


import thread
import time
import random
import math

global balance

def deposit(amount):
    global balance
    local_balance = balance
    time.sleep(math.ceil(random.random()*1))
    local_balance += amount
    balance = local_balance
    
def withdraw(amount):
    global balance
    local_balance = balance
    time.sleep(math.ceil(random.random()*1))
    local_balance -= amount
    balance = local_balance
    

if __name__ == '__main__':
    global balance
    balance = 2000
    
    for i in xrange(0, 10, 1):
        try:
            thread.start_new_thread(deposit, (1000,))
            thread.start_new_thread(withdraw, (200,))
            time.sleep(2)
        except:
            print "Error: unable to start thread"
        print "Balance: $%d" % (balance)
    time.sleep(2)
    print "Final Balance: $%d" % (balance)

In this, there's a shared "balance" variable that the threads will use. The script will spawn two threads, a "deposit" and "withdraw" thread. These threads create a local copy of "balance", do a random wait of 0-1 seconds, then modify the local copy before modifying balance with the local copy. The main thread waits some time before posting the results of the transactions. This happens 10 times to let the effect compound. At the end, the final balance is posted. Running this script three times, you get these results:


[Run 1]
Balance: $3000
Balance: $2800
Balance: $3800
Balance: $4800
Balance: $5800
Balance: $6800
Balance: $7800
Balance: $7600
Balance: $8600
Balance: $8400
Final Balance: $8400

[Run 2]
Balance: $1800
Balance: $2800
Balance: $2600
Balance: $2400
Balance: $2200
Balance: $3200
Balance: $3000
Balance: $2800
Balance: $2600
Balance: $3600
Final Balance: $3600

[Run 3]
Balance: $3000
Balance: $4000
Balance: $5000
Balance: $4800
Balance: $5800
Balance: $6800
Balance: $6600
Balance: $6400
Balance: $6200
Balance: $6000
Final Balance: $6000

As you can see, they all ran differently. This problem by the way is called a race condition, named such because the threads are "racing" to be first.

But you may think in this example, if we were running everything on one machine, "why bother with doing a local copy? Why not modify the variable itself?" This might be doable in a single core system, but in a multicore system, you can still run into problems. Let's say these threads run on two separate cores in a multi-core processor. The processor also caches the data in each core's L2 or L1 cache and the threads work on the copy in the cache rather than in main memory. When the threads are done, the cores save the contents of what's in cache back into main memory. Depending on who gets to saving the contents of their cache first, you can have the the incorrect output as above. To prove this is the case, you can run this example on your computer, set the processor affinity to only one processor, and it won't produce an erroneous result.

What issues are there in multithreaded programming?

Spoiler

Though the common point between multithreading programming is resource contention, there's several, unique versions of it that one has to be aware of.

Deadlock
- A thread is waiting for a resource that another thread has, but that thread won't give it up until it acquires another resource, which another thread has.
- Real-life example: Two nations doing a prisoner exchange meet at a bridge, but the nations won't give up their prisoner until the other one does first.
Livelock
- Multiple threads want a resource, but they're letting another thread have it, who in turn, lets another thread have it.
- Real-life example: Two people walking into their own paths in a hallway, then play the shuffle game as they try to move past each other.
Starvation
- A lower priority thread waiting on a resource is constantly pre-empted by higher priority thread for a resource and never gets to run.
- Real-life example: You're in the lowerclassmen group in college and you're trying to get into a class that's required, but all the upperclassmen and other "priority registration" groups are taking the available seats.
Priority Inversion
- A higher priority thread that's waiting for a resource is pre-empted by a lower priority thread who has the resource.
- Real-life example: You need to use the computer to write up a school report but your sibling has the computer to play games on.
Race Condition
- Multiple threads are sharing the same resource, and by the time they're done, the incorrect result is saved because they weren't aware of what each other were doing.
- Real-life example: Two people are adding something to a Wikipedia page. When they both save, the one who saved last has their content posted instead. (This is probably fixed)
Non Atomic Operations
- An atomic operation is something that can be done without being interrupted. This is either something so basic that it can't be interrupted, like adding two numbers together, or interrupts are explicitly disabled somehow. Non-atomic operations are interruptible, which can cause a thread who was working on a resource to be interrupted, only for another thread to modify that resource before handing back control to the original thread. Note that switching threads (context switch) due to a thread running out of time on the processor is considered interrupting.
- Real-life example: A captain of an airplane is reading off a navigation chart, but has to use the restroom. The co-pilot takes over, receives new information, and changes the navigation chart. When the captain comes back, if they were not aware of the change, then something bad could happen.

How do you resolve multithreading issues?

Spoiler

There are several ways to manage mulithreading issues. I'll discuss two of them:

Locking

Locking is declaring exclusive access to a resource, which breaks down into two types: mutex and semaphore. In this, threads do share some kind of locking variable, but access to it is atomic, that is, the operation cannot be interrupted at all because it's the lowest possible execution thing (like 1+1 is an atomic operation because you cannot split this up any further) or you make it so it cannot be interrupted.

In the case of a mutex, a thread grabs it when it needs to access a shared resource and releases it when it's done doing what it needed to do. If a thread tries to grab this mutex while another has it, then the thread can either wait until it's free or come back another time. Semaphores on the other hand use a counter. If the semaphore value is 0, anyone can take it. Once a thread takes it, the semaphore increments by 1 and is decremented by 1 when released. Semaphores are useful when certain threads can use the shared resources while other threads cannot.

Using our bank example, lets add a mutex so that when withdrawing or depositing money, the mutex is taken first before modifying the balance:


import thread
import time
import random
import math
from threading import Lock

global balance
mutex = Lock()

def deposit(amount):
    global balance
    with mutex:
        time.sleep(math.ceil(random.random()*1))
        balance += amount
    
def withdraw(amount):
    global balance
    with mutex:
        time.sleep(math.ceil(random.random()*1))
        balance -= amount
    

if __name__ == '__main__':
    global balance
    balance = 2000
    
    for i in xrange(0, 10, 1):
        try:
            thread.start_new_thread(deposit, (1000,))
            thread.start_new_thread(withdraw, (200,))
            time.sleep(2)
        except:
            print "Error: unable to start thread"
        print "Balance: $%d" % (balance)
    time.sleep(2)
    print "Final balance: $%d" % (balance)

The printout shows it can produce funky, intermediate results, but the final result is always the correct answer: $10000


[Run 1]
Balance: $3000
Balance: $3800
Balance: $4600
Balance: $5400
Balance: $6200
Balance: $7000
Balance: $7800
Balance: $8600
Balance: $9400
Balance: $10200
Final balance: $10000

[Run 2]
Balance: $3000
Balance: $3600
Balance: $4400
Balance: $5400
Balance: $6000
Balance: $6800
Balance: $7600
Balance: $8400
Balance: $9200
Balance: $10000
Final balance: $10000

[Run 3]
Balance: $3000
Balance: $3800
Balance: $4600
Balance: $5400
Balance: $6200
Balance: $7000
Balance: $7800
Balance: $8600
Balance: $9400
Balance: $10200
Final balance: $10000

The advantages of locking is that it's very cheap. You only need one variable for a lock, "taking it" and "releasing it" involves simple arithmetic operations, and checking it is quick. The problem is that it can leave a thread alive, but not doing anything as it sits around waiting for the lock to be released. Another problem is misuse of locks creates deadlocks and livelocks.

Isolating the resource to a manager

A manager is something that has exclusive control over the resource and handles how its modified. While this sounds like it's not sharing the resource, all the other components need to do is modify the data, they just need to do it safely. In the bank example I've been using, I'll create a bank object that manages a person's balance. The "deposit" and "withdraw" functions have to instead post a transaction request to the bank object, and at some point the bank will process the transactions.


import thread
import time
import random
import math

class bank:
    def __init__(self, starting_balance):
        self.balance = starting_balance
        self.transactions = []
        
    def print_balance(self):
        return self.balance
        
    def post_transaction(self, amount):
        self.transactions.append(amount)
        
    def process_transactions(self):
        while len(self.transactions) > 0:
            self.balance += self.transactions.pop()
            
def deposit(amount, bank):
    time.sleep(math.ceil(random.random()*1))
    print "\nDepositing $%d" % amount
    bank.post_transaction(amount)
    
def withdraw(amount, bank):
    time.sleep(math.ceil(random.random()*1))
    print "\nWithdrawing $%d" % amount
    bank.post_transaction(-amount)
    

if __name__ == '__main__':
    the_bank = bank(2000)
    
    for i in xrange(0, 10, 1):
        try:
            thread.start_new_thread(deposit, (1000,the_bank,))
            thread.start_new_thread(withdraw, (200,the_bank,))
        except:
            print "Error: unable to start thread"
    time.sleep(10)
    the_bank.process_transactions()
    print "Final balance: $%d" % the_bank.print_balance()

Keep in mind I'm not aiming for completeness here, so there are some issues with the object. However, the basic idea still exists. "deposit" and "withdraw" now add their transactions to a queue in the bank object, which will process them later. The end result is still the same:


[Run 1]
Withdrawing $200
Withdrawing $200
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Depositing $1000
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Withdrawing $200
Depositing $1000
Depositing $1000
Final balance: $10000

[Run 2]
Withdrawing $200
Depositing $1000
Depositing $1000
Depositing $1000
Withdrawing $200
Withdrawing $200
Withdrawing $200
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Depositing $1000
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Final balance: $10000

[Run 3]
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Depositing $1000
Depositing $1000
Withdrawing $200
Withdrawing $200
Depositing $1000
Depositing $1000
Withdrawing $200
Withdrawing $200
Final balance: $10000

A manager based setup requires no locking. It also avoids race conditions because transactions will be handled in the order they came in. However, it's still more resource intensive and the bank object needs to process the data in a timely manner, otherwise the queue gets filled and things start to fall apart.

So there you have it! A primer on the wonders of programming with multiple threads. In case you're wondering, the examples were done in Python 2.7. So you can actually run this by yourself if you want.

SCHISCHKA · December 8, 2016

theres also this https://en.wikipedia.org/wiki/Embarrassingly_parallel

As soon as you introduce some logic into your program it stops being easy

Sauron · December 8, 2016

I knew most of this already but the post is informative nonetheless! Good job.

Mira Yurizaki · December 8, 2016

10 minutes ago, SCHISCHKA said:

theres also this https://en.wikipedia.org/wiki/Embarrassingly_parallel

As soon as you introduce some logic into your program it stops being easy

This works well if the data sets are independent, but sometimes they're not. That's why I used the bank example. There's no logic going on but you can still be screwed (or not) without resource control.

Unimportant · December 9, 2016

You might want to add a text about "atomicity of operations" in the issues section. It's a special version of a race which deserves some explanation on it's own. In the ATM example you might end up with neither 1800 or 3000 but some twisted combination of both because of this.

kennethnakasone · December 9, 2016

You got me interested...but I have a paper to write so I'll come back to finish reading later.

(leaving this comment so I can find the thread later)

Sign In

A primer into multithreaded programming

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I’m Selling This PC for $2… I Hope It Sucks

Latest From Tech Quickie:

Yes, It’s Real: PCI Express x32

Latest From TechLinked:

Our Planes Are Made By Psychopaths

Latest From GameLinked:

Nintendo Spilled the Beans

Latest From ShortCircuit:

An ignorant rabbit r1 unboxing

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!

My Activity Streams