COMP6991 - Solving Modern Programming Problems with Rust
Final Exam for 23T1
With my own solution (might be incorrect)
Question
Question:
Explain the difference between the &str and String types, with respect to ownership.
&str
is a string slice, it is a reference for a String
data. It does not own that data.
String
is a type which has data to record the words and characters.
Explain how the behaviour regarding lifetimes differ between &str and String.
String
does not need lifetime, or its lifetime only depends on its scope.
&str
is a reference, so that its lifetime must be shorter than or equal to the data it refers.
Outline an example of when a &str might be used instead of a String.
Read-only is suitable for &str
. It is faster and it does not occur ownership issues.
Outline an example of when a String might be used instead of a &str.
When you want to edit the string or own the data, you must use String
.
相关信息
The following Rust code fails to compile, but equivalent code in other popular programming languages (e.g. C, Java, Python) compiles and/or works correctly.
RUSTfn main() {
let my_string = String::from("Hello, World!");
// print the string twice
print_string(my_string);
print_string(my_string);
}
fn print_string(string: String) {
println!("{string}");
}
Question:
Explain what issue(s) prevent the Rust compiler from building this code
my_string
is a String
type and its ownership has been transfer to the first line of print_string(my_string);
. The second print_string()
call could not use my_string
because Rust only allows one function to hold its ownership.
Explain the philosophy behind this language decision.
Ownership system could protect the memory safety. Rust aims to decrease bugs which are related to dangling pointers, double frees, or other memory problems.
Question
Question:
Idiomatic Rust programs often extensively use enums when considering errors or other failure cases. Select another programming language of your choice (e.g. C, Java, Python, Go, etc.), and contrast its equivalent error-handling features with Rust.
Provide at least three individual points of contrast (3 marks), along with an overall judgement as to their merits and disadvantages. (1 mark)
NOTE:
Any reasonable programming language is an acceptable choice for this question.
Esoteric languages (e.g., Brainf***, INTERCAL, etc.) are not acceptable, and will not receive marks.
You must select a programming language. Languages that do not involve programming (e.g. markup languages (HTML, Markdown), query languages (SQL, jq), etc.) are not acceptable, and will not receive marks.
Programming languages that are slightly more niche (e.g. Haskell, Zig, Swift, Lisp) are perfectly acceptable.
If you are unsure whether your programming language is "reasonable", simply default to C.
Result<T, E>
and Option<T>
enums to handle errors. Err
in Result
and None
in Option
both allow the external function to process the fault result, rather than panic
. However, in Python, if there is an unexpected result, the function will throw an error and it is hard to debug.Err
or None
, and it could be complex and not very clear. Python usually use independent error handling codes, but you should have a doc to understand the meaning in large program.Result
and Option
provide a high level of safety, preventing common errors such as null dereferences and ensuring all error cases are handled. Python's exceptions offer flexibility and ease of use, as they allow developers to separate error handling from business logic, and to pass errors across multiple layers of function calls effortlessly. This flexibility, however, can lead to uncaught exceptions if the developer is not careful.Overall Judgement:
Both Rust's enum-based error handling and Python's exception handling have their merits and disadvantages. Rust’s method is safer and makes errors explicit but can lead to more verbose code and a steeper learning curve. Python’s exception handling is more flexible and can lead to cleaner main logic paths, but it requires careful management to avoid unhandled exceptions and potential crashes.
相关信息
Boris is attempting to write a function split_once, which takes two string slices: string
and split_on
. If split_on
occurs within string
, it will return Some
tuple of the slice of string
that exists before the first occurence of split_on
, and the remainder of string
after the first occurence.
Boris is certain that he has written the implementation of the function correctly, but is not so confident in Rust when it comes to lifetimes. Sadly, his code does not compile currently.
RUSTpub fn split_once<'a, 'b>(string: &'a str, split_on: &'b str) -> Option<(&'b str, &'a str)> {
let index = string.find(split_on)?;
let tuple = (&string[..index], &string[index + split_on.len()..]);
Some(tuple)
}
Your task is to fix the lifetimes issue with Boris' broken code. You only have to modify lifetimes in order to solve the issue, and thus you are not permitted to change any code except that relevant to lifetimes in src/lib.rs. You are permitted to add, remove, and modify lifetimes in this exercise.
This is an example of the expected behaviour:
$ 6991 cargo run --bin exam_q2 Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/exam_q2` Some(("hello", "world")) $ 6991 cargo run --bin exam_q2_alt Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/exam_q2_alt` Some(("hello", "world"))
Question
Luke is trying to define a library of sports cars for use in his Rust programs. He decides that he would like to make use of the trait system to do so, and so defines the following trait:
RUSTpub trait Car {
fn brand(&self) -> CarBrand;
fn horsepower(&self) -> u32;
}
pub enum CarBrand {
Toyota,
Subaru,
Nissan,
}
Because defining a type and implementing a trait for each individual car requires a substantial amount of boilerplate, he also defines a macro to make the process simpler:
RUST#[macro_export]
macro_rules! car {
($name:ident, $brand:expr, $horsepower:literal) => {
pub struct $name;
impl Car for $name {
fn brand(&self) -> CarBrand {
use CarBrand::*;
$brand
}
fn horsepower(&self) -> u32 {
$horsepower
}
}
}
}
And so defines the individual cars like such:
RUSTcar!(Corolla, Toyota, 100);
car!(Cressida, Toyota, 160);
car!(Chaser, Toyota, 220);
car!(Liberty, Subaru, 100);
car!(Impreza, Subaru, 130);
car!(Wrx, Subaru, 200);
car!(Pulsar, Nissan, 90);
car!(Silvia, Nissan, 200);
car!(Skyline, Nissan, 220);
This all seems to have worked correctly, so he creates his first function, favourite_car, which given a particular CarBrand will return back his favourite car model from that brand:
RUSTfn favourite_car(brand: CarBrand) -> impl Car {
use CarBrand::*;
match brand {
Toyota => Cressida,
Subaru => Liberty,
Nissan => Skyline,
}
}
However, attempting to build this leads Luke into a problem:
$ 6991 cargo build Compiling cars v0.1.0 error[E0308]: `match` arms have incompatible types --> src/favourite.rs:8:13 | 6 | / match brand { 7 | | Toyota => Cressida, | | -------- this is found to be of type `cars::Cressida` 8 | | Subaru => Liberty, | | ^^^^^^^ expected struct `cars::Cressida`, found struct `cars::Liberty` 9 | | Nissan => Skyline, 10 | | } | |_____- `match` arms have incompatible types
Question
Explain why Luke is receiving this error message, even with an impl Car return type.
Rust should ensure the type safety, and the types are known and consistent at compile time. In Rust, functions can not return multiple types. Car
is not a normal type, and it could be multiple types during executing: Cressida, Liberty, and Skyline.
Explain two issues Rust (the language) would face if this behaviour were to be allowed.
The type safety will be in danger. Compiler could not understand that type should be there and what type would be accepted. It will lead to unpredictability in type handling.
Provide an example of a possible fix for Luke's issue.
He could use a trait object as the return type instead of imple Trait
. It allows for returning different types because of dynamic dispatch.
RUSTfn favourite_car(brand: CarBrand) -> Box<dyn Car> {
use CarBrand::*;
match brand {
Toyota => Box::new(Cressida),
Subaru => Box::new(Liberty),
Nissan => Box::new(Skyline),
}
}
Question
Luke decides that his car! macro still has a bit too much boilerplate. Instead of defining cars individually, he would like to only perform one macro invocation for a whole brand of cars at a time.
That is, he would like to modify his macro such that the following code works equivalently:
RUSTcar! {
brand = Toyota;
models = [
Corolla = 100,
Cressida = 160,
Chaser = 220,
];
}
car! {
brand = Subaru;
models = [
Liberty = 100,
Impreza = 130,
Wrx = 200,
];
}
car! {
brand = Nissan;
models = [
Pulsar = 90,
Silvia = 200,
Skyline = 220,
];
}
Rewrite Luke's macro as described. Provide the full macro_rules! definition in your response. (3 marks)
RUST#[macro_export]
macro_rules! car {
// Match the overall structure for the brand and models definition
(
brand = $brand:expr;
models = [
$( $name:ident = $horsepower:literal ),* $(,)*
];
) => {
// Iterate over each model definition
$(
// Declare each car model as a struct
pub struct $name;
impl Car for $name {
fn brand(&self) -> CarBrand {
// Use the passed brand directly in the generated code
use CarBrand::*;
$brand
}
fn horsepower(&self) -> u32 {
// Use the provided horsepower for each model
$horsepower
}
}
)*
};
}
Question
Luke would finally like to parameterise the Car trait to include type-level information about the car's transmission layout. There are three possible transmission layouts in Luke's library: front-wheel drive, rear-wheel drive, and all-wheel drive.
Luke knows that most cars are only sold with one choice of transmission layout. For example, all Wrxs are all-wheel drive, and all Silvias are rear-wheel drive.
However, some cars may support different transmission layouts. For example, while Corollas were originally rear-wheel drive, newer Corollas can come in both front-wheel drive and all-wheel drive!
With this in mind, Luke settles on two options for his trait design:
RUST// Layouts
struct RearWheelDrive;
struct FrontWheelDrive;
struct AllWheelDrive;
// Trait definition:
// Option 1
pub trait Car<Layout> {
fn brand(&self) -> CarBrand;
fn horsepower(&self) -> u32;
}
// Option 2
pub trait Car {
type Layout;
fn brand(&self) -> CarBrand;
fn horsepower(&self) -> u32;
}
Question
Explain the differences between Luke's two options.
In the first option, the Car trait is defined with a generic type parameter, Layout. This means that any type implementing the Car trait must specify its layout at the point where the trait is implemented.
In the second option, the Car trait uses an associated type for the layout. This means the layout is fixed for each implementation of the Car trait and cannot vary between instances of the same type. The layout type is part of the trait's implementation and thus is defined once per struct/type, not per instance.
With the information provided, which option is more suitable for Luke's case?
Given that some car models in Luke's library might support different transmission layouts (like the Corolla example), Option 1 using generics is more suitable. This approach provides the necessary flexibility to accommodate multiple layouts for a single car model, allowing each instance of a car struct to potentially specify a different layout. This design choice supports the variability seen in real-world car models where different configurations might be available, reflecting more accurately the diversity of car layouts in a single car model.
Question
The following program attempts to launch two new threads, which each print a shared string.
RUSTuse std::thread;
fn main() {
let the_string = String::from("Hello, World!");
let handle_1 = thread::spawn(|| {
print_string(&the_string);
});
let handle_2 = thread::spawn(|| {
print_string(&the_string);
});
handle_1.join().unwrap();
handle_2.join().unwrap();
}
fn print_string(string: &str) {
println!("{string}");
}
However, the code does not compile.
Note: the signature of std::thread::spawn
is as follows:
RUSTpub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static,
{
...
}
Question:
std::thread::spawn
's signature that is responsible for this compiler error. (1 mark)Explain why the code does not compile.
the_string
is created inside the main
function, and its lifetime is the scope of main
. But if a thread is created, it breaks Rust's concurrency rule when try to send the_string
immutable references to other threads, because Rust cannot guarantee that the_string
will still exist when these references are used by them.
Identify the specific part of
std::thread::spawn
's signature that is responsible for this compiler error.
Closure F
must be Send + 'static
, and all of the data in closure couldd be sent between threads, and in the whole program's lifetime the data's ownership is in closure.
Explain two possible solutions to fix this issue.
Arc
to share the_string
.main
to threads.Question
When lock()
is called on a std::sync::Mutex
, a Result containing a MutexGuard is returned.
Question:
How does the Mutex automatically unlock itself?
When the MutexGuard lifetime ends, the mutex is automatically unlocked. MutexGuard
is a smart pointer that automatically calls its destructor (Drop trait
) at the end of the scope. In the drop()
method of MutexGuard
, it releases the mutex.
How does Rust statically prevent programmers from extracting the guarded value out of a MutexGuard?
MutexGuard
holds mutable references to internal data, which means that you can't simply move the reference or internal data out, or you will break the ownership system (one data one owner).
Question
The signature of std::iter::Iterator::map
looks approximately like the following:
RUSTpub trait Iterator {
type Item;
...
fn map<B, F>(self, f: F) -> Map<Self, F>
where
F: FnMut(Self::Item) -> B,
{
...
}
}
However, the rayon crate's equivalent function, rayon::iter::ParallelIterator::map
contains some subtle differences:
RUSTpub trait ParallelIterator: Send {
type Item: Send;
...
fn map<F, R>(self, map_op: F) -> Map<Self, F>
where
F: Fn(Self::Item) -> R + Sync + Send,
R: Send,
{
...
}
}
Question:
Justify the change in type bounds to the type
Item
associated typeparameter
.
Because Rayon should be used in a multiple-thread enviorment, and Item
should across multiple threads. To safely send it, it should be Send
to ensure that ownership can be transfered without breaking Rust's safety rule.
Justify the change in type bounds involving Send and Sync within
fn map
.
The closure F
may be accessed from multiple threads simultaneously so it should have Sync
, and the reason of Send
is same as before: it should be sent across threads.
Because the results may need to be transferred across threads (e.g., collecting or reducing results in a parallel computation), it should have Send
.
Justify the change in function trait (
FnMut
toFn
) for the provided closure to map.
Using Fn
rather than FnMut
shows that each invocation of the closure should not depend on the mutable state from a previous invocation.
Question
In this question, you will implement a basic in-memory cache.
A cache is a data-type which when given some input, will perform some calculation and return the result. However, if the calculation for that specific input had already been performed previously, the cache will simply return the result of the previous calculation, instead of doing the hard work again.
You have been given src/lib.rs, which describes a very simple cache primitive:
RUSTuse std::{collections::HashMap, rc::Rc};
pub struct Cache {
calculator: fn(&String) -> String,
cache_map: HashMap<String, Rc<String>>,
}
impl Cache {
pub fn new(calculator: fn(&String) -> String) -> Self {
Cache {
calculator,
cache_map: HashMap::new(),
}
}
pub fn get(&mut self, key: String) -> Rc<String> {
if let Some(value) = self.cache_map.get(&key) {
Rc::clone(value)
} else {
let value = Rc::new((self.calculator)(&key));
self.cache_map.insert(key, Rc::clone(&value));
value
}
}
}
This code correctly implements an in-memory cache that works only for key-value pairs of type String. That is, the cache will work for a calculator of type fn(&String) -> String, and allow you to query/fill the cache with the get function turning a String key into an Rc<String>
value.
Your task is to make this cache more generic. You must modify the type of the calculator from a function pointer of &String -> String
into a closure of &Key -> Value
for any possible types Key and Value. As part of this, you must also decide which type of closure is the most generic choice out of FnOnce, FnMut and Fn.
This will also require you to change the generic parameters within the HashMap from <String, Rc<String>>
into <Key, Rc<Value>>
, and similarly within the get function, as previously described in the calculator closure.
You will be required to constrain some generic type parameters as you solve the exercise. You must ensure that you do not overly constrain the types, only requiring what is minimally needed.
Question
Question:
Is it possible to invoke undefined behaviour in Rust? If so, how? If not, why?
When you use unsafe
code, you might invoke undefined behaviour because Rust does not check unsafe code.
Explain the difference between an unsafe block and an unsafe function in Rust.
Unsafe Block: It is just an area that could use unsafe code, such as using wild pointer. Such a block can be declared inside any function, whether the function is marked as unsafe.
Unsafe Function: Unsafe function can perform unsafe operations, and calling to an unsafe function must be in an unsafe block.The insecurity of an insecure function is explicitly exposed to the caller, and he should be responsible for placing such calls in an insecure block.
Describe the technique of unsafe implementations behind safe abstractions, and explain how it reduces Rust's scope for unsoundness.
We could use safe API, and do not consider whether it uses unsafe code or not. This technique allows API developers to use unsafe code to increase performance, use other language's code, or operate memory directly but do not break Rust's safety rule.
Users can only use the APIs which developers have been defined. It could avoid users find and use unsafe code and make mistakes.
Question
The following code attempts to write an (inefficient, but simple) implementation of a channel using unsafe. In particular, it implements a bounded-buffer single-producer single-consumer channel where the bound is always 1.
RUSTpub fn channel() -> (Sender<i32>, Receiver<i32>) {
let buffer = Box::into_raw(Box::new(None::<i32>));
let hung_up = Box::into_raw(Box::new(false));
let sender = Sender { buffer, hung_up };
let receiver = Receiver { buffer, hung_up };
(sender, receiver)
}
pub struct Sender<T> {
buffer: *mut Option<T>,
hung_up: *mut bool,
}
pub struct Receiver<T> {
buffer: *mut Option<T>,
hung_up: *mut bool,
}
impl<T> Sender<T> {
pub fn send(&mut self, value: T) -> Option<()> {
if unsafe { *self.hung_up } {
return None;
}
// wait until the channel is empty...
loop {
let value = unsafe { std::ptr::read(self.buffer) };
if value.is_none() { break; }
std::mem::forget(value);
}
// send the value into the shared buffer
unsafe { std::ptr::write(self.buffer, Some(value)); }
Some(())
}
}
impl<T> Receiver<T> {
pub fn recv(&mut self) -> Option<T> {
loop {
if unsafe { *self.hung_up } {
return None;
}
// wait until the value exists...
if let Some(value) = unsafe { std::ptr::read(self.buffer) } {
// clear the channel for the next message
unsafe { std::ptr::write(self.buffer, None); }
return Some(value);
}
}
}
}
unsafe impl<T: Send> Send for Sender<T> {}
unsafe impl<T> Send for Receiver<T> {}
impl<T> Drop for Sender<T> {
fn drop(&mut self) {
unsafe { *self.hung_up = true; }
}
}
impl<T> Drop for Receiver<T> {
fn drop(&mut self) {
unsafe { *self.hung_up = true; }
}
}
It can be used like this:
RUSTuse exam_q6_lib::channel;
fn main() {
std::thread::scope(|scope| {
let (mut send, mut recv) = channel();
scope.spawn(move || {
while let Some(num) = recv.recv() {
println!("Thread got {num}!");
}
println!("Thread finished!");
});
for i in 1..=5 {
println!("Sending {i}...");
send.send(i);
}
println!("Sending finished!");
drop(send);
});
}
$ 6991 cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/exam_q6` Sending 1... Sending 2... Sending 3... Thread got 1! Sending 4... Thread got 2! Thread got 3! Thread got 4! Sending 5... Thread got 5! Sending finished! Thread finished
However, sometimes when it is run, the thread never receives the final message!
Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/exam_q6` Sending 1... Sending 2... Sending 3... Thread got 1! Thread got 2! Thread got 3! Sending 4... Sending 5... Sending finished! Thread got 4! Thread finished
Question:
Perform a code review on the channel implementation (not the main function), carefully examining the usage of unsafe and make a judgement on the overall soundness.
Buffer Access: The buffer is handled through raw pointers, allowing direct memory access which is typical in low-level systems programming. However, using raw pointers increases the risk of dereferencing null or invalid pointers.
Hung Up Flag: The hung_up flag is similarly accessed via a raw pointer. This is another critical point because it directly impacts the flow control (whether messages can be sent or received).
Demonstrate or otherwise explain how to fix the issue shown above where the thread sometimes does not receive the final message. Pasting a diff of the necessary changes to lib.rs is sufficient.
The primary issue is the lack of synchronization which could cause the receiver to miss updates to the buffer or see stale data. The sender might also overwrite the buffer before the receiver can process an item, given the current crude check-and-set mechanism.
Utilize Rust's atomic operations or mutexes to manage access to shared state. This would ensure memory visibility and mutual exclusion, addressing the race conditions.
To manage the busy-waiting effectively and allow the sender and receiver to yield control until the buffer is ready to be written to or read from.
Here's a simplified diff for the conceptual changes needed:
+ use std::sync::{Mutex, Condvar, Arc}; pub struct Sender<T> { - buffer: *mut Option<T>, - hung_up: *mut bool, + shared: Arc<(Mutex<Option<T>>, Condvar, bool)>, } pub struct Receiver<T> { - buffer: *mut Option<T>, - hung_up: *mut bool, + shared: Arc<(Mutex<Option<T>>, Condvar, bool)>, } + // Adjustments in constructors and send/recv methods to use mutexes and condition variables.
This example adds a tuple containing a mutex, a condition variable, and a boolean flag encapsulated within an Arc for shared ownership and thread safety. This would replace the direct usage of raw pointers and manual synchronization with more robust constructs. The mutex ensures mutual exclusion while the condition variable can be used to block a thread until a condition is met (buffer empty or full), effectively managing the busy-wait scenarios and synchronization between threads safely.
ITS SOOOOOOOOOOOOOOOOO LOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOONG.
本文作者:Jeff Wu
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!