COMP6991 - Solving Modern Programming Problems with Rust

Prac Exam for 22T3

With my own solution (might be incorrect)

Question 1

1.1

Question

A C programmer who is starting to learn Rust has asked: "Aren't match statements just complicated if statements?". Give a specific example of a situation where you believe a match statement would significantly improve code quality, instead of a series of if/else statements.

In some situations, match statement is just like a more complex if statement. However, it has more davantages in cleanliness and expressiveness, especially when you need to have some enums, or handle multiple cases.

For example, I am at a cross and there is a sign to guide me. The text on the sign could only be three kinds, and we could set it as an enum statement:

RUST
enum SignText {
    TurnLeft,
    TurnRight,
    GoStraight,
}

In Rust, we could use match to handle multiple cases:

RUST
fn action(text: SignText) -> &static str {
    match text {
        SignText::TurnLeft => "Turn Left",
        SignText::TurnRight => "Turn Right",
        SignText::GoStraight => "Go Straight",
    }
}

However, if/else statement is more complex:

RUST
fn action(text: SignText) -> &static str {
    if light == SignText::TurnLeft {
        "Turn Left"
    } else if light == SignText::TurnRight {
        "Turn Right"
    } else if light == SignText::GoStraight {
        "Go Straight"
    } else {
        "unreachable"
    }
}

1.2

Question

The following Rust code fails to compile, but equivalent code in other popular programming languages (e.g. C, Java, Python) compiles and/or works correctly. Explain what issue(s) prevent the Rust compiler from building this code, and the philosophy behind this language decision.

RUST
struct Coordinate {
    x: i32,
    y: i32,
};

let coord1 = Coordinate {x: 1, y: 2};
let coord2 = coord1;
let coord_sum = Coordinate { x: coord1.x + coord2.x, y: coord1.y + coord2.y };

The problem is ownership. In this code, the ownership of coord1 has been transfered to coord2. Therefore, coord_sum could not use coord1.x or coord1.y to state itself.

Rust uses ownership to prevent the potential data race or null pointer, and give a reliable concurrency environment and high performance.

1.3

Question

In other languages, the operation: "first_string" + "second_string" produces a new string, "first_stringsecond_string". This particular operation does not work in Rust.

Why does Rust not implement this operation on the &str type?
Would it be possible for the Rust language developers to implement this? What rust feature would they use to implement it?
Do you think the Rust language developers should implement this operation? Give one reason to justify your answer.

Why does Rust not implement this operation on the &str type?

Because all of the linking operation need new memory space to create and store new string data. However, &str is not a mutable reference. Using + operation will break the rule of ownership principle.

Would it be possible for the Rust language developers to implement this? What rust feature would they use to implement it?

Developers could create a special trait Add, and use overloaded operator to make it. Add could make the result of a plus between two &str to be a String.

Do you think the Rust language developers should implement this operation? Give one reason to justify your answer.

No, I do not think so. The most important part of Rust is security and reliability. It bases on ownership, borrowing, memory management, and other principles. However, auto transition between &str and String and memory management could make some hidden problem in a whole project, and it could be deadly in production environment.

Question 2

Question

In this activity, you will be building a small text searching system. It should search a large string for sentences that contain a particular search term. Another function will then look through all the search results to determine how often each sentence was found.

You have been given starter code which does not yet compile. Your task is to fill in both todo!() statements, as well as to add lifetimes where required in order to build your code.

You are not permitted to change the return type of functions, the names of structs, or the types of structs. You may also not change the main function, and you should expect that the main function could be changed during testing. You will, however, have to add lifetimes to existing types in order to successfully compile your code.

This is an example of the expected behaviour:


$  6991 cargo run test_data/test_data.txt 
    Finished dev [unoptimized + debuginfo] target(s) in 0.36s
    Running `target/debug/prac_q2`
there
very
prove
the universe

    Ctrl - D

Found 1 results for 'there'.
Found 9 results for 'very'.
Found 1 results for 'prove'.
Found 11 results for 'the universe'.
'8 billion years ago, space expanded very quickly (thus the name "Big Bang")' occured 1 times.
'According to the theory the universe began as a very hot, small, and dense superforce (the mix of the four fundamental forces), with no stars, atoms, form, or structure (called a "singularity")' occured 2 times.
'Amounts of very light elements, such as hydrogen, helium, and lithium seem to agree with the theory of the Big Bang' occured 1 times.
'As a whole, the universe is growing and the temperature is falling as time passes' occured 1 times.
'Because most things become colder as they expand, scientists assume that the universe was very small and very hot when it started' occured 2 times.
'By measuring the redshift, scientists proved that the universe is expanding, and they can work out how fast the object is moving away from the Earth' occured 2 times.
'Cosmology is the study of how the universe began and its development' occured 1 times.
'Other observations that support the Big Bang theory are the amounts of chemical elements in the universe' occured 1 times.
'The Big Bang is a scientific theory about how the universe started, and then made the stars and galaxies we see today' occured 1 times.
'The Big Bang is the name that scientists use for the most common theory of the universe, from the very early stages to the present day' occured 2 times.
'The more redshift there is, the faster the object is moving away' occured 1 times.
'The most commonly considered alternatives are called the Steady State theory and Plasma cosmology, according to both of which the universe has no beginning or end' occured 1 times.
'The most important is the redshift of very far away galaxies' occured 1 times.
'These electromagnetic waves are everywhere in the universe' occured 2 times.
'This radiation is now very weak and cold, but is thought to have been very strong and very hot a long time ago' occured 1 times.
'With very exact observation and measurements, scientists believe that the universe was a singularity approximately 13' occured 2 times.

RUST
use std::fs;
use std::env;
use std::io::{self, BufRead};
use std::error::Error;
use std::collections::HashMap;

// NOTE: You *may not* change the names or types of the members of this struct.
//       You may only add lifetime-relevant syntax.
pub struct SearchResult<'a, 'b> {
    pub matches: Vec<&'a str>,
    pub contains: &'b str
}

/// Returns a [`SearchResult`] struct, where the matches vec is
/// a vector of every sentence that contains `contains`.
///
/// A sentence is defined as a slice of an `&str` which is the first
/// character of the string, or the first non-space character after
/// a full-stop (`.`), all the way until the last non-space character
/// before a full-stop or the end of the string.
///
/// For example, In the string "Hello. I am Tom . Goodbye", the three
/// sentences are "Hello", "I am Tom" and "Goodbye"
fn find_sentences_containing<'a, 'b>(text: &'a str, contains: &'b str) -> SearchResult<'a, 'b> {
    let mut sentences = Vec::new();
    let mut start = 0;

    for (index, character) in text.char_indices() {
        let is_end_of_sentence = character == '.' || index == text.len() - 1;
        if is_end_of_sentence {
            let end = if character == '.' { index } else { text.len() };
            let sentence = text[start..end].trim();
            if sentence.contains(contains) {
                sentences.push(sentence);
            }
            start = end + 1;
        }
    }

    SearchResult { 
        matches: sentences, 
        contains 
    }
}


/// Given a vec of [`SearchResult`]s, return a hashmap, which lists how many
/// time each sentence occured in the search results.
fn count_sentence_matches<'a, 'b>(searches: Vec<SearchResult<'a, 'b>>) -> HashMap<&'a str, i32> {
    let mut counts = HashMap::new();
    for search_result in searches {
        for sentence in search_result.matches {
            let count = counts.entry(sentence).or_insert(0);
            *count += 1;
        }
    }
    counts
}


/////////// DO NOT CHANGE BELOW HERE ///////////

fn main() -> Result<(), Box<dyn Error>> {
    let args: Vec<String> = env::args().collect();
    let file_path = &args[1];

    let text = fs::read_to_string(file_path)?;

    let mut sentence_matches = {
        let mut found = vec![];

        let stdin = io::stdin();
        let matches = stdin.lock().lines().map(|l| l.unwrap()).collect::<Vec<_>>();
        for line in matches.iter() {
            let search_result = find_sentences_containing(&text, line);
            println!("Found {} results for '{}'.", search_result.matches.len(), search_result.contains);
            found.push(search_result);
        }

        count_sentence_matches(found).into_iter().collect::<Vec<_>>()
    };
    sentence_matches.sort();

    for (key, value) in sentence_matches {
        println!("'{}' occured {} times.", key, value);
    }

    Ok(())
}

Question 3

Question

In this question, your task is to complete two functions, and make them generic: zip_tuple and unzip_tuple. Right now, the zip_tuple function takes a Vec<Coordinate> and returns a tuple: (Vec<i32>, Vec<i32>). The unzip_tuple function performs the inverse of this.

This code currently does not compile, because q3_lib (i.e. lib.rs) does not know what the type of Coordinate is. Rather than telling the functions what type Coordinate is, in this exercise we will make the functions generic, such that it works for both q3_a (i.e. main_1.rs) and q3_b (i.e. main_2.rs). This is to say, tuple_unzip should work for any Vec<T> such that T implements Into into a 2-tuple of any 2 types, and tuple_zip should work for any Vec<(T, U)> such that (T, U) implements Into into any type.

Once you have modified your function signatures for tuple_unzip and tuple_zip, you should find that the only concrete type appearing within the signature is Vec. In other words, the functions should work for any type which can be created from a 2-tuple and which can be converted into a 2-tuple.

RUST
pub fn tuple_unzip<T, A, B>(items: Vec<T>) -> (Vec<A>, Vec<B>)
where
    T: Into<(A, B)>,
{
    let mut first = Vec::new();
    let mut second = Vec::new();
    for item in items {
        let (a, b) = item.into();
        first.push(a);
        second.push(b);
    }
    (first, second)
}

pub fn tuple_zip<T, A, B>(items: (Vec<A>, Vec<B>)) -> Vec<T>
where
    T: From<(A, B)>,
{
    items.0.into_iter().zip(items.1.into_iter()).map(|(a, b)| T::from((a, b))).collect()
}

Question 4

4.1

Question

Steve is writing some Rust code for a generic data structure, and creates a (simplified) overall design alike the following:

RUST
struct S {
    // some fields...
}

impl S {
    fn my_func<T>(value: T) {
        todo!()
    }
}

He soon finds that this design is not sufficient to model his data structure, and revises the design as such:

RUST
struct S<T> {
    // some fields...
}

impl<T> S<T> {
    fn my_func(value: T) {
        todo!()
    }
}

Give an example of a data-structure that Steve could be trying to implement, such that his first design would not be sufficient, and instead his second design would be required for a correct implementation. Furthermore, explain why this is the case.

Because in the first design, the structure S could not support generic. It means that all of the S use the same structure, regardless of the data type they should operate on. my_func could use generic T to operate data, but it may not store the data because S could not handle it.

In the second design, S<T> could use some generic fields to support different type data storage.

RUST
struct S {
    elements: Vec<i32>,   // only handle i32 value
    ...
}

struct S<T> {
    elements: Vec<T>,     // handle generic value
    ...
}

4.2

Question

Emily is designing a function that has different possibilities for the value it may return. She is currently deciding what kind of type she should use to represent this property of her function.

She has narrowed down three possible options:

An enum
A trait object
A generic type (as fn foo(...) -> impl Trait)

For each of her possible options, explain one possible advantage and one possible disadvantage of that particular choice.

An enum
- Advantage: Enums could include different types of values, and all of them could carry data. It is suiltable for handling multiple cases.
- Disadvantage: All of the type the function returns are static. If she wants to add some new type of value in the future, she should modify the definition of enum. And lots of code could be effected.
A trait object
- Advantage: It use dynamic dispatch to support different types of objects & values. It avoid the future modification if the developer wants to add some new type in enum.
- Disadvantage: Because of the dynamic dispatch at runtime, it needs more cost. In addition, all returned types should be with the same trait, and it could limit the range of types.
A generic type (as fn foo(...) -> impl Trait)
- Advantage: impl allows functions to return any type with a generic. It is very flexiable.
- Disadvantage: imple Trait functions usually could only return 1 type of value. It could limit the functions when they want to use different return types.

4.3

Question

Rust's macro system offers an extremely flexible method for code generation and transfiguring syntax, but this language feature comes with certain costs. Identify 3 downsides to the inclusion, design, or implementation of Rust's macro system.

(Note that your downsides may span any amount and combination of the categories above. e.g. you could write all 3 on just one category, or one on each, or anything in-between.)

Increased compilation time: Each time a macro is used, the compiler has to process the result of the macro expansion, and it could increase the complexity and time, especially in some project which includes lots of macro invocations.
Increased debug difficulty: When project has some bugs, the error message might point to the expanded code which is made by macros rather than the source code which is edited by developers.
Reduced code readability: macro allows to embed non-standard code. It could make code difficult to understand and maintain by other developers.

Question 5

5.1

Question

In many other popular programming languages, mutexes provide lock() and unlock() methods which generally do not return any value (i.e. void).

What issues could this cause?

How does Rust differently implement the interface of a Mutex, and what potential problems does that help solve?

What issues could this cause?

Forgetting to unlock: developer may forget to use unlock() after every lock(), and it could cause the resource could not be released or even deadlock.
Exception before unlock: code may occur some exceptions between lock() and unlock(), and it could make the resource could not be released or even deadlock.
More complexity: Developer should manage every lock state and take care for every lock's adding and removing.

How does Rust differently implement the interface of a Mutex, and what potential problems does that help solve?

Scope only: Mutex could return a guard object when resource is locked, and no matter how to exit this code scope, the lock will be unlocked.
Compiling check: Lock state will be checked when the code is compiling, and compiler will find out all the potential lock risks and send error/warning messages.
Increased code readability: Mutex only use a state code to make resouces lock and auto-unlock, this could allow other developers to read and understand code easily.

5.2

Question

In Rust, locking a Mutex returns a Result, instead of simply a MutexGuard. Explain what utility this provides, and why a programmer might find this important.

Handling error: Result allows Mutex to process other resources or just wait to try again if this lock operation is failed, rather than execute a panic operation.
Using check: In a concurrence environment, Result could make it understand that this resource is using rather than wait for a long time, it is useful for some time-limited situations.

5.3

Question

While reviewing someone's code, you find the following type: Box<dyn Fn() -> i32 + Send>.

Explain what the + Send means in the code above?

Explain one reason you might need to mark a type as Send, and what restrictions apply when writing a closure that must be Send.

Explain what the + Send means in the code above?

Send trait could mark this resource as a safe type which could transfer from one thread to another. Its ownership could also transfer.

Explain one reason you might need to mark a type as Send, and what restrictions apply when writing a closure that must be Send.

Reason: You might want to use this type in a multithreaded environment. Use Send to ensure that it could be used through threads and do not break the security rule of Rust.

Restriction:

When a closure needs to be Send, it could not get any non-Send variables.
The closure should only get variables that satisfy the Send constraint.

5.4

Question

Your friend tells you they don't need the standard library's channels, since they've implemented their own alternative with the following code:

RUST
use std::collections::VecDeque;
use std::sync::Mutex;
use std::sync::Arc;
use std::thread;

#[derive(Clone, Debug)]
struct MyChannel<T> {
    internals: Arc<Mutex<VecDeque<T>>>
}

impl<T> MyChannel<T> {
    fn new() -> MyChannel<T> {
        MyChannel {
            internals: Arc::new(Mutex::new(VecDeque::new()))
        }
    }
    fn send(&mut self, value: T) {
        let mut internals = self.internals.lock().unwrap();
        internals.push_front(value);
    }

    fn try_recv(&mut self) -> Option<T> {
        let mut internals = self.internals.lock().unwrap();
        internals.pop_back()
    }
}

fn main() {
    let mut sender = MyChannel::<i32>::new();
    let mut receiver = sender.clone();
    sender.send(5);
    thread::spawn(move || {
        println!("{:?}", receiver.try_recv())
    }).join().unwrap();
}

Identify a use-case where this implementation would not be sufficient, but the standard library's channel would be.

Furthermore, explain why this is the case.

std provide blocking receive operations. When the queue is empty, blocking receive operations put the thread to sleep until data is available. However, this implementation could only return None directly when it is empty. It will be in busy-waiting, and waste the CPU resource.

Question 6

Question

The "Read Copy Update" pattern is a common way of working with data when many sources need to be able to access data, but also to update it. It allows a user to access a value whenever it's needed, achieving this by never guaranteeing that the data is always the latest copy. In other words, there will always be something, but it might be slightly old. In some cases, this trade-off is one that's worth making.

In this task, you will be implementing a small RCU data-structure. You should ensure that:

Multiple threads are able to access a given piece of data.
Threads can pass a closure to the type which updates the data.
When created, the RCU type starts at generation 0. Every time it is updated, that counter is increased by one.

You have been given some starter code for the type RcuType<T>, including some suggested fields, and the required interface. Ensure you first understand the requirements of this task, and then implement the methods described in the starter code.

RUST
use std::sync::{RwLock, Arc, atomic::{AtomicUsize, Ordering}};

pub struct RCUType<T> {
    data: Arc<RwLock<Arc<T>>>,
    generation: Arc<AtomicUsize>,
}

impl<T> RCUType<T> {
    /// Creates a new `RCUType` with a given value.
    pub fn new(value: T) -> RCUType<T> {
        RCUType {
            data: Arc::new(RwLock::new(Arc::new(value))),
            generation: Arc::new(AtomicUsize::new(0)),
        }
    }

    /// Will call the closure `updater`, passing the current
    /// value of the type; allowing the user to return a new
    /// value for this to store.
    pub fn update(&self, updater: impl FnOnce(&T) -> T) {
        let mut data_guard = self.data.write().unwrap();
        let new_value = updater(&data_guard);
        *data_guard = Arc::new(new_value);
        self.generation.fetch_add(1, Ordering::SeqCst);
    }

    /// Returns an atomically reference counted smart-pointer
    /// to the most recent copy of data this function has.
    pub fn get(&self) -> Arc<T> {
        Arc::clone(&self.data.read().unwrap())
    }

    /// Return the number of times that the RCUType has been updated.
    pub fn get_generation(&self) -> usize {
        self.generation.load(Ordering::SeqCst)
    }
}

impl<T> Clone for RCUType<T> {
    fn clone(&self) -> Self {
        Self {
            data: self.data.clone(),
            generation: self.generation.clone(),
        }
    }
}

Question 7

7.1

Question

Gavin writes a blog post critical of Rust, especially with respect to unsafe. In his blog post, he claims that it's not possible to have any confidence in the overall safety of a Rust program since "even if you only write safe Rust, most standard functions you call will have unsafe code inside them".

State to what extent you agree with Gavin's claim.
Give at least three arguments that support your conclusion.

I partially disagree. It is true that there are lots of unsafe code in Rust std, however, it does not mean that Rust code is unsafe.

Unsafe code must exist: These unsafe code usually intereacts operation system directly, so they must break Rust security rule. However, Rust std has been reviewed strictly. These codes have passed automated tests and developers' reviews.
Encapsulation: User could not access these unsafe codes because std use API to avoid all of the unsafe situation. Developers just follow Rust rules and they could avoid most safety errors.
Compiler check: Compiler will check unsafe code using when it is compiling the codes. When it finds some potential safty error, it will stop or send warning message to developers to alarm them.

7.2

Question

Hannah writes a Rust program that intends to call some C code directly through FFI. Her C function has the following prototype:

C
int array_sum(int *array, int array_size);

Note that you can assume that this C code is written entirely correctly, and the below extern "C" block is an accurate translation of the C interface.

Her Rust code is currently written as follows:

RUST
use std::ffi::c_int;

#[link(name = "c_array")]
extern "C" {
    fn array_sum(array: *mut c_int, array_size: c_int) -> c_int;
}

fn test_data() -> (*mut c_int, c_int) {
    let size = 10;
    let array = vec![6991; size].as_mut_ptr();
    (array, size as c_int)
}

fn main() {
    let sum = {
        let (array, size) = test_data();

        // Debug print:
        let message = format!("Calling C function with array of size: {size}");
        println!("{message}");

        unsafe { array_sum(array, size) }
    };

    println!("C says the sum was: {sum}");
}

She expects that if she runs her code, it should print that the C code summed to 69910. To her surprise, she runs the program and finds the following:


$ 6991 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/ffi`
Calling C function with array of size: 10
C says the sum was: -2039199222

Hannah correctly concludes that there must be a problem with her Rust code.

Identify the issue that is causing the program to misbehave.
Describe a practical solution Hannah could use to fix the bug.
Explain why Rust wasn't able to catch this issue at compile-time.

Identify the issue that is causing the program to misbehave.

When use vec![6991; size].as_mut_ptr() to get mut ptr, this vec will be released when the function test_data() is end, because this function owns this vec. Therefore, when this ptr gets in array_sum, it becomes a wild pointer and occurs the error output.

Describe a practical solution Hannah could use to fix the bug.

ptr should in a bigger scope to ensure that it is valid when it gets in array_sum.

For example, we could change the test_data() to return the ptr.

RUST
fn test_data() -> (Vec<c_int>, *mut c_int, c_int) {
    let size = 10;
    let mut array = vec![6991; size];
    let ptr = array.as_mut_ptr();
    (array, ptr, size as c_int)
}

fn main() {

    // 'array' here keeps the vector alive
    let (array, ptr, size) = test_data();  

    let sum = unsafe { array_sum(ptr, size) };

    println!("C says the sum was: {sum}");
}

Explain why Rust wasn't able to catch this issue at compile-time.

Because she used unsafe. Rust does not check unsafe code because it thinks developers understand what they are doing and it does not send any memory safety error message.

Question 8

Question

The final question of the exam will be a more open-ended question which will ask you to perform some analysis or make an argument. Your argument will be judged alike an essay (are your claims substantiated by compelling arguments). Remember that you will not get any marks for blindly supporting Rust.

A friend of yours has just read this article, and thinks that it means they shouldn't learn Rust.

Read through the article, and discuss the following prompt:

Rust is not worth learning, as explained by this article.

The overall structure of your answer is not marked. For example, your answer may include small paragraphs of prose accompanied by dot-points, or could instead be posed as a verbal discussion with your friend. Regardless of the structure / formatting you choose, the substance of what you write is the most important factor, and is what will determine your overall mark for this question.

目录

Question 1

1.1

1.2

1.3

Question 2

Question 3

Question 4

4.1

4.2

4.3

Question 5

5.1

5.2

5.3

5.4

Question 6

Question 7

7.1

7.2

Question 8