COMP6991 - Solving Modern Programming Problems with Rust

Final Exam for 23T1

With my own solution (might be incorrect)

Question 1

1.1

Question

Question:

Explain the difference between the &str and String types, with respect to ownership. (1 mark)
Explain how the behaviour regarding lifetimes differ between &str and String. (1 mark)
Outline an example of when a &str might be used instead of a String. (1 mark)
Outline an example of when a String might be used instead of a &str. (1 mark)

Explain the difference between the &str and String types, with respect to ownership.

&str is a string slice, it is a reference for a String data. It does not own that data.

String is a type which has data to record the words and characters.

Explain how the behaviour regarding lifetimes differ between &str and String.

String does not need lifetime, or its lifetime only depends on its scope.

&str is a reference, so that its lifetime must be shorter than or equal to the data it refers.

Outline an example of when a &str might be used instead of a String.

Read-only is suitable for &str. It is faster and it does not occur ownership issues.

Outline an example of when a String might be used instead of a &str.

When you want to edit the string or own the data, you must use String .

1.2

1.3

Question

Question:

Idiomatic Rust programs often extensively use enums when considering errors or other failure cases. Select another programming language of your choice (e.g. C, Java, Python, Go, etc.), and contrast its equivalent error-handling features with Rust.

Provide at least three individual points of contrast (3 marks), along with an overall judgement as to their merits and disadvantages. (1 mark)

NOTE:

Any reasonable programming language is an acceptable choice for this question.

Esoteric languages (e.g., Brainf***, INTERCAL, etc.) are not acceptable, and will not receive marks.

You must select a programming language. Languages that do not involve programming (e.g. markup languages (HTML, Markdown), query languages (SQL, jq), etc.) are not acceptable, and will not receive marks.

Programming languages that are slightly more niche (e.g. Haskell, Zig, Swift, Lisp) are perfectly acceptable.

If you are unsure whether your programming language is "reasonable", simply default to C.

Error handling: Rust use Result<T, E> and Option<T> enums to handle errors. Err in Result and None in Option both allow the external function to process the fault result, rather than panic. However, in Python, if there is an unexpected result, the function will throw an error and it is hard to debug.
Readability: Rust should write error handling for every Err or None, and it could be complex and not very clear. Python usually use independent error handling codes, but you should have a doc to understand the meaning in large program.
Flexibility: The strict type system and pattern matching on Result and Option provide a high level of safety, preventing common errors such as null dereferences and ensuring all error cases are handled. Python's exceptions offer flexibility and ease of use, as they allow developers to separate error handling from business logic, and to pass errors across multiple layers of function calls effortlessly. This flexibility, however, can lead to uncaught exceptions if the developer is not careful.

Overall Judgement:

Both Rust's enum-based error handling and Python's exception handling have their merits and disadvantages. Rust’s method is safer and makes errors explicit but can lead to more verbose code and a steeper learning curve. Python’s exception handling is more flexible and can lead to cleaner main logic paths, but it requires careful management to avoid unhandled exceptions and potential crashes.

Question 2

Question 3

Question

Luke is trying to define a library of sports cars for use in his Rust programs. He decides that he would like to make use of the trait system to do so, and so defines the following trait:

RUST
pub trait Car {
	fn brand(&self) -> CarBrand;
	fn horsepower(&self) -> u32;
}

pub enum CarBrand {
	Toyota,
	Subaru,
	Nissan,
}

Because defining a type and implementing a trait for each individual car requires a substantial amount of boilerplate, he also defines a macro to make the process simpler:

RUST
#[macro_export]
macro_rules! car {
	($name:ident, $brand:expr, $horsepower:literal) => {
		pub struct $name;
		impl Car for $name {
			fn brand(&self) -> CarBrand {
				use CarBrand::*;
				$brand
			}
			fn horsepower(&self) -> u32 {
				$horsepower
			}
		}
	}
}

And so defines the individual cars like such:

RUST
car!(Corolla,  Toyota, 100);
car!(Cressida, Toyota, 160);
car!(Chaser,   Toyota, 220);

car!(Liberty,  Subaru, 100);
car!(Impreza,  Subaru, 130);
car!(Wrx,      Subaru, 200);

car!(Pulsar,   Nissan, 90);
car!(Silvia,   Nissan, 200);
car!(Skyline,  Nissan, 220);

This all seems to have worked correctly, so he creates his first function, favourite_car, which given a particular CarBrand will return back his favourite car model from that brand:

RUST
fn favourite_car(brand: CarBrand) -> impl Car {
	use CarBrand::*;

	match brand {
		Toyota => Cressida,
		Subaru => Liberty,
		Nissan => Skyline,		
	}
}

However, attempting to build this leads Luke into a problem:


$ 6991 cargo build
   Compiling cars v0.1.0
error[E0308]: `match` arms have incompatible types
  --> src/favourite.rs:8:13
   |
6  | /     match brand {
7  | |         Toyota => Cressida,
   | |                   -------- this is found to be of type `cars::Cressida`
8  | |         Subaru => Liberty,
   | |                   ^^^^^^^ expected struct `cars::Cressida`, found struct `cars::Liberty`
9  | |         Nissan => Skyline,
10 | |     }
   | |_____- `match` arms have incompatible types

3.1

Question

Explain why Luke is receiving this error message, even with an impl Car return type. (1 mark)
Explain two issues Rust (the language) would face if this behaviour were to be allowed. (2 marks)
Provide an example of a possible fix for Luke's issue. (1 mark)

Explain why Luke is receiving this error message, even with an impl Car return type.

Rust should ensure the type safety, and the types are known and consistent at compile time. In Rust, functions can not return multiple types. Car is not a normal type, and it could be multiple types during executing: Cressida, Liberty, and Skyline.

Explain two issues Rust (the language) would face if this behaviour were to be allowed.

The type safety will be in danger. Compiler could not understand that type should be there and what type would be accepted. It will lead to unpredictability in type handling.

Provide an example of a possible fix for Luke's issue.

He could use a trait object as the return type instead of imple Trait. It allows for returning different types because of dynamic dispatch.

RUST
fn favourite_car(brand: CarBrand) -> Box<dyn Car> {
    use CarBrand::*;

    match brand {
        Toyota => Box::new(Cressida),
        Subaru => Box::new(Liberty),
        Nissan => Box::new(Skyline),		
    }
}

3.2

Question

Luke decides that his car! macro still has a bit too much boilerplate. Instead of defining cars individually, he would like to only perform one macro invocation for a whole brand of cars at a time.

That is, he would like to modify his macro such that the following code works equivalently:

RUST
car! {
	brand  = Toyota;
	models = [
		Corolla  = 100,
		Cressida = 160,
		Chaser   = 220,
	];
}

car! {
	brand  = Subaru;
	models = [
		Liberty = 100,
		Impreza = 130,
		Wrx     = 200,
	];
}

car! {
	brand  = Nissan;
	models = [
		Pulsar  = 90,
		Silvia  = 200,
		Skyline = 220,
	];
}

Rewrite Luke's macro as described. Provide the full macro_rules! definition in your response. (3 marks)

RUST
#[macro_export]
macro_rules! car {
    // Match the overall structure for the brand and models definition
    (
        brand = $brand:expr;
        models = [
            $( $name:ident = $horsepower:literal ),* $(,)*
        ];
    ) => {
        // Iterate over each model definition
        $(
            // Declare each car model as a struct
            pub struct $name;
            impl Car for $name {
                fn brand(&self) -> CarBrand {
                    // Use the passed brand directly in the generated code
                    use CarBrand::*;
                    $brand
                }
                fn horsepower(&self) -> u32 {
                    // Use the provided horsepower for each model
                    $horsepower
                }
            }
        )*
    };
}

Macro Pattern: The macro now starts with a specific pattern matching where brand and a list of models are specified. Each model in the models list is a name-value pair where the car name is followed by = and the horsepower as a literal value.
Repetition: Using $( ... )* allows for repetition of a pattern. In this case, each car model along with its horsepower can be defined one after the other in the list. Commas are managed with$ (,)* to handle potential trailing commas.
Code Generation: Inside the repetition, the macro generates a pub struct for each car model and implements the Car trait for that struct. The brand() function in the trait implementation returns the brand specified at the macro call, and horsepower() returns the horsepower specified for each model.

3.3

Question

Luke would finally like to parameterise the Car trait to include type-level information about the car's transmission layout. There are three possible transmission layouts in Luke's library: front-wheel drive, rear-wheel drive, and all-wheel drive.

Luke knows that most cars are only sold with one choice of transmission layout. For example, all Wrxs are all-wheel drive, and all Silvias are rear-wheel drive.

However, some cars may support different transmission layouts. For example, while Corollas were originally rear-wheel drive, newer Corollas can come in both front-wheel drive and all-wheel drive!

With this in mind, Luke settles on two options for his trait design:

RUST
// Layouts
struct RearWheelDrive;
struct FrontWheelDrive;
struct AllWheelDrive;

// Trait definition:

// Option 1
pub trait Car<Layout> {
	fn brand(&self) -> CarBrand;
	fn horsepower(&self) -> u32;
}

// Option 2
pub trait Car {
	type Layout;

	fn brand(&self) -> CarBrand;
	fn horsepower(&self) -> u32;
}

Question

Explain the differences between Luke's two options. (2 marks)
With the information provided, which option is more suitable for Luke's case? (1 mark)

Explain the differences between Luke's two options.

In the first option, the Car trait is defined with a generic type parameter, Layout. This means that any type implementing the Car trait must specify its layout at the point where the trait is implemented.

In the second option, the Car trait uses an associated type for the layout. This means the layout is fixed for each implementation of the Car trait and cannot vary between instances of the same type. The layout type is part of the trait's implementation and thus is defined once per struct/type, not per instance.

With the information provided, which option is more suitable for Luke's case?

Given that some car models in Luke's library might support different transmission layouts (like the Corolla example), Option 1 using generics is more suitable. This approach provides the necessary flexibility to accommodate multiple layouts for a single car model, allowing each instance of a car struct to potentially specify a different layout. This design choice supports the variability seen in real-world car models where different configurations might be available, reflecting more accurately the diversity of car layouts in a single car model.

Question 4

4.1

Question

The following program attempts to launch two new threads, which each print a shared string.

RUST
use std::thread;

fn main() {
	let the_string = String::from("Hello, World!");

	let handle_1 = thread::spawn(|| {
		print_string(&the_string);
	});

	let handle_2 = thread::spawn(|| {
		print_string(&the_string);
	});

	handle_1.join().unwrap();
	handle_2.join().unwrap();
}

fn print_string(string: &str) {
	println!("{string}");
}

However, the code does not compile.

Note: the signature of std::thread::spawn is as follows:

RUST
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
    F: FnOnce() -> T + Send + 'static,
    T: Send + 'static,
{
	...
}

Question:

Explain why the code does not compile. (1 mark)
Identify the specific part of std::thread::spawn's signature that is responsible for this compiler error. (1 mark)
Explain two possible solutions to fix this issue. (2 marks)

Explain why the code does not compile.

the_string is created inside the main function, and its lifetime is the scope of main. But if a thread is created, it breaks Rust's concurrency rule when try to send the_string immutable references to other threads, because Rust cannot guarantee that the_string will still exist when these references are used by them.

Identify the specific part of std::thread::spawn's signature that is responsible for this compiler error.

Closure F must be Send + 'static, and all of the data in closure couldd be sent between threads, and in the whole program's lifetime the data's ownership is in closure.

Explain two possible solutions to fix this issue.

Use Arc to share the_string.
move the ownership from main to threads.

4.2

Question

When lock() is called on a std::sync::Mutex, a Result containing a MutexGuard is returned.

Question:

How does the Mutex automatically unlock itself? (1 mark)
How does Rust statically prevent programmers from extracting the guarded value out of a MutexGuard? (2 marks)

How does the Mutex automatically unlock itself?

When the MutexGuard lifetime ends, the mutex is automatically unlocked. MutexGuard is a smart pointer that automatically calls its destructor (Drop trait) at the end of the scope. In the drop() method of MutexGuard, it releases the mutex.

How does Rust statically prevent programmers from extracting the guarded value out of a MutexGuard?

MutexGuard holds mutable references to internal data, which means that you can't simply move the reference or internal data out, or you will break the ownership system (one data one owner).

4.3

Question

The signature of std::iter::Iterator::map looks approximately like the following:

RUST
pub trait Iterator {
	type Item;

	...

	fn map<B, F>(self, f: F) -> Map<Self, F>
	where
	    F: FnMut(Self::Item) -> B,
	{
		...
	}
}

However, the rayon crate's equivalent function, rayon::iter::ParallelIterator::map contains some subtle differences:

RUST
pub trait ParallelIterator: Send {
	type Item: Send;

	...

	fn map<F, R>(self, map_op: F) -> Map<Self, F>
	where
		F: Fn(Self::Item) -> R + Sync + Send,
		R: Send,
	{
		...
	}
}

Question:

Justify the change in type bounds to the type Item associated type parameter. (1 mark)
Justify the change in type bounds involving Send and Sync within fn map. (1 mark)
Justify the change in function trait (FnMut to Fn) for the provided closure to map. (1 mark)

Justify the change in type bounds to the type Item associated type parameter.

Because Rayon should be used in a multiple-thread enviorment, and Item should across multiple threads. To safely send it, it should be Send to ensure that ownership can be transfered without breaking Rust's safety rule.

Justify the change in type bounds involving Send and Sync within fn map.

The closure F may be accessed from multiple threads simultaneously so it should have Sync, and the reason of Send is same as before: it should be sent across threads.

Because the results may need to be transferred across threads (e.g., collecting or reducing results in a parallel computation), it should have Send.

Justify the change in function trait (FnMut to Fn) for the provided closure to map.

Using Fn rather than FnMut shows that each invocation of the closure should not depend on the mutable state from a previous invocation.

Question 5

Question

In this question, you will implement a basic in-memory cache.

A cache is a data-type which when given some input, will perform some calculation and return the result. However, if the calculation for that specific input had already been performed previously, the cache will simply return the result of the previous calculation, instead of doing the hard work again.

You have been given src/lib.rs, which describes a very simple cache primitive:

RUST
use std::{collections::HashMap, rc::Rc};

pub struct Cache {
    calculator: fn(&String) -> String,
    cache_map: HashMap<String, Rc<String>>,
}

impl Cache {
    pub fn new(calculator: fn(&String) -> String) -> Self {
        Cache {
            calculator,
            cache_map: HashMap::new(),
        }
    }

    pub fn get(&mut self, key: String) -> Rc<String> {
        if let Some(value) = self.cache_map.get(&key) {
            Rc::clone(value)
        } else {
            let value = Rc::new((self.calculator)(&key));
            self.cache_map.insert(key, Rc::clone(&value));

            value
        }
    }
}

This code correctly implements an in-memory cache that works only for key-value pairs of type String. That is, the cache will work for a calculator of type fn(&String) -> String, and allow you to query/fill the cache with the get function turning a String key into an Rc<String> value.

Your task is to make this cache more generic. You must modify the type of the calculator from a function pointer of &String -> String into a closure of &Key -> Value for any possible types Key and Value. As part of this, you must also decide which type of closure is the most generic choice out of FnOnce, FnMut and Fn.

This will also require you to change the generic parameters within the HashMap from <String, Rc<String>> into <Key, Rc<Value>>, and similarly within the get function, as previously described in the calculator closure.

You will be required to constrain some generic type parameters as you solve the exercise. You must ensure that you do not overly constrain the types, only requiring what is minimally needed.

Question 6

6.1

Question

Question:

Is it possible to invoke undefined behaviour in Rust? If so, how? If not, why? (1 mark)
Explain the difference between an unsafe block and an unsafe function in Rust. (1 mark)
Describe the technique of unsafe implementations behind safe abstractions, and explain how it reduces Rust's scope for unsoundness. (2 marks)

Is it possible to invoke undefined behaviour in Rust? If so, how? If not, why?

When you use unsafe code, you might invoke undefined behaviour because Rust does not check unsafe code.

Explain the difference between an unsafe block and an unsafe function in Rust.

Unsafe Block: It is just an area that could use unsafe code, such as using wild pointer. Such a block can be declared inside any function, whether the function is marked as unsafe.

Unsafe Function: Unsafe function can perform unsafe operations, and calling to an unsafe function must be in an unsafe block.The insecurity of an insecure function is explicitly exposed to the caller, and he should be responsible for placing such calls in an insecure block.

Describe the technique of unsafe implementations behind safe abstractions, and explain how it reduces Rust's scope for unsoundness.

We could use safe API, and do not consider whether it uses unsafe code or not. This technique allows API developers to use unsafe code to increase performance, use other language's code, or operate memory directly but do not break Rust's safety rule.

Users can only use the APIs which developers have been defined. It could avoid users find and use unsafe code and make mistakes.

6.2

Question

The following code attempts to write an (inefficient, but simple) implementation of a channel using unsafe. In particular, it implements a bounded-buffer single-producer single-consumer channel where the bound is always 1.

RUST
pub fn channel() -> (Sender<i32>, Receiver<i32>) {
	let buffer  = Box::into_raw(Box::new(None::<i32>));
	let hung_up = Box::into_raw(Box::new(false));

	let sender   = Sender   { buffer, hung_up };
	let receiver = Receiver { buffer, hung_up };

	(sender, receiver)
}

pub struct Sender<T> {
	buffer:  *mut Option<T>,
	hung_up: *mut bool,
}

pub struct Receiver<T> {
	buffer:  *mut Option<T>,
	hung_up: *mut bool,
}

impl<T> Sender<T> {
	pub fn send(&mut self, value: T) -> Option<()> {
		if unsafe { *self.hung_up } {
			return None;
		}

		// wait until the channel is empty...
		loop {
			let value = unsafe { std::ptr::read(self.buffer) };
			if value.is_none() { break; }
			std::mem::forget(value);
		}

		// send the value into the shared buffer
		unsafe { std::ptr::write(self.buffer, Some(value)); }

		Some(())
	}
}

impl<T> Receiver<T> {
	pub fn recv(&mut self) -> Option<T> {
		loop {
			if unsafe { *self.hung_up } {
				return None;
			}

			// wait until the value exists...
			if let Some(value) = unsafe { std::ptr::read(self.buffer) } {
				// clear the channel for the next message
				unsafe { std::ptr::write(self.buffer, None); }

				return Some(value);
			}
		}
	}
}

unsafe impl<T: Send> Send for Sender<T> {}
unsafe impl<T> Send for Receiver<T> {}

impl<T> Drop for Sender<T> {
	fn drop(&mut self) {
		unsafe { *self.hung_up = true; }
	}
}

impl<T> Drop for Receiver<T> {
	fn drop(&mut self) {
		unsafe { *self.hung_up = true; }
	}
}

It can be used like this:

RUST
use exam_q6_lib::channel;

fn main() {
	std::thread::scope(|scope| {
		let (mut send, mut recv) = channel();

		scope.spawn(move || {
			while let Some(num) = recv.recv() {
				println!("Thread got {num}!");
			}

			println!("Thread finished!");
		});

		for i in 1..=5 {
			println!("Sending {i}...");
			send.send(i);
		}

		println!("Sending finished!");
		drop(send);
	});
}


$ 6991 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/exam_q6`
Sending 1...
Sending 2...
Sending 3...
Thread got 1!
Sending 4...
Thread got 2!
Thread got 3!
Thread got 4!
Sending 5...
Thread got 5!
Sending finished!
Thread finished

However, sometimes when it is run, the thread never receives the final message!


    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/exam_q6`
Sending 1...
Sending 2...
Sending 3...
Thread got 1!
Thread got 2!
Thread got 3!
Sending 4...
Sending 5...
Sending finished!
Thread got 4!
Thread finished

Question:

Perform a code review on the channel implementation (not the main function), carefully examining the usage of unsafe and make a judgement on the overall soundness. (4 marks)
Demonstrate or otherwise explain how to fix the issue shown above where the thread sometimes does not receive the final message. Pasting a diff of the necessary changes to lib.rs is sufficient. (2 marks)

Perform a code review on the channel implementation (not the main function), carefully examining the usage of unsafe and make a judgement on the overall soundness.

Buffer Access: The buffer is handled through raw pointers, allowing direct memory access which is typical in low-level systems programming. However, using raw pointers increases the risk of dereferencing null or invalid pointers.

Hung Up Flag: The hung_up flag is similarly accessed via a raw pointer. This is another critical point because it directly impacts the flow control (whether messages can be sent or received).

Demonstrate or otherwise explain how to fix the issue shown above where the thread sometimes does not receive the final message. Pasting a diff of the necessary changes to lib.rs is sufficient.

The primary issue is the lack of synchronization which could cause the receiver to miss updates to the buffer or see stale data. The sender might also overwrite the buffer before the receiver can process an item, given the current crude check-and-set mechanism.

Utilize Rust's atomic operations or mutexes to manage access to shared state. This would ensure memory visibility and mutual exclusion, addressing the race conditions.

To manage the busy-waiting effectively and allow the sender and receiver to yield control until the buffer is ready to be written to or read from.

Here's a simplified diff for the conceptual changes needed:


+ use std::sync::{Mutex, Condvar, Arc};

pub struct Sender<T> {
-   buffer: *mut Option<T>,
-   hung_up: *mut bool,
+   shared: Arc<(Mutex<Option<T>>, Condvar, bool)>,
}

pub struct Receiver<T> {
-   buffer: *mut Option<T>,
-   hung_up: *mut bool,
+   shared: Arc<(Mutex<Option<T>>, Condvar, bool)>,
}

+ // Adjustments in constructors and send/recv methods to use mutexes and condition variables.

This example adds a tuple containing a mutex, a condition variable, and a boolean flag encapsulated within an Arc for shared ownership and thread safety. This would replace the direct usage of raw pointers and manual synchronization with more robust constructs. The mutex ensures mutual exclusion while the condition variable can be used to block a thread until a condition is met (buffer empty or full), effectively managing the busy-wait scenarios and synchronization between threads safely.

Question 7

ITS SOOOOOOOOOOOOOOOOO LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOONG.

目录

Question 1

1.1

1.2

1.3

Question 2

Question 3

3.1

3.2

3.3

Question 4

4.1

4.2

4.3

Question 5

Question 6

6.1

6.2

Question 7