S5340 Assignment 4 (Semester 1, AY2022/2023) 1

CS5340 ASSIGNMENT 4:

MONTE CARLO INFERENCE

1. OVERVIEW

In this assignment, you will write code to perform Monte Carlo inference i.e. Importance sampling

and Gibbs sampling. Often, marginalization over large/continuous probability distributions can

be intractable. In Monte Carlo inference, we circumvent this problem by sampling from simpler

proposal distributions to approximate our target distribution, making the problem much more

tractable for complex distributions.

References: Lecture 9

Honour Code. This coding assignment constitutes 15% of your final grade in CS5340. Note that

plagiarism will not be condoned! You may discuss with your classmates and check the internet

for references, but you MUST NOT submit code/report that is copied directly from other sources!

2. SUBMISSION INSTRUCTIONS

Items to be submitted:

Source code (zip file containing folders part1 and part2):

o part1/main.py– code for importance sampling.

o part2/main.py – code for Gibbs sampling.

Report (report.pdf). This should describe your implementation and be no more than one

page.

Please indicate clearly your name and student number (the one that looks like A1234567X) in the

report as well as the top of your source code. Zip the two files together and name it in the

following format: A1234567X_lab4.zip (replace with your student number).

Submit your assignment by 2022 , 2359HRS to LumiNUS. 25% of the total score

will be deducted for each day of late submission.

3. GETTING STARTED

This assignment as well as the subsequent ones require Python 3.5, or later. You need certain

python packages, which can be installed using the following command:

pip install -r requirements.txt

If you have any issues with the installation, please post them in the forum, so that other students

or the instructors can help accordingly.

31O October

CS5340 Assignment 4 (Semester 1, AY2022/2023) 2

4. TEST CASES

To help with your implementation, we have provided a few sample inputs. They can be found in

the data/inputs folder. The ground-truth files can be found in data/ground-truth. Your

predictions will be stored in data/predictions. For test cases 1 and 2, the inputs will be on the

example given in Lecture 9:

For case 3, we will use a slightly larger graph. Note that the ground-truth might be slightly

different from your answers. We will assume a tolerance up to 1 decimal place. During grading,

your code will be evaluated on hidden test cases on top of the validation test cases we have

provided.

Additionally, we expect your code to run in a reasonable amount of time, any time within 2-3x of

our timings below is reasonable:

Case 1 2 3 4

Importance

Sampling

10s 5s 15s 19s

Gibbs Sampling 13s 7s 29s 21s

Table 1: Timings on i5-1035G1 CPU @ 1.00GHz.

For fairness, we will evaluate your timings on the same system.

5. IMPORTANCE SAMPLING

In part 1 i.e. importance sampling, you will be provided with local target conditional distribution

(|) and local proposal conditional probabilities i.e. (|) where is are the

parents of node . In this assignment, we will assume our proposal distribution has the form i.e.

() = ∏ (|) and the target distribution has the form () = ∏(|) . Each

sample is then weighted by the ratio between

()

()

to obtain the approximate distribution. You are

expected to return the conditional probability distribution (|) where the nodes in the

graph are = ∪ , with being the query nodes and being the evidence nodes.

CS5340 Assignment 4 (Semester 1, AY2022/2023) 3

Note that during sampling, students are expected to perform sampling in a topological order i.e.

parent nodes should be sampled before child nodes, and the input samples should be updated

with the parent samples before sampling from the child node.

You will implement the following functions:

? Performs sampling for a single iteration (all nodes sampled once): _sample_step()[3

points]

o Students are expected to sample from the local proposal distributions for each

node. Do not sample from a joint proposal distribution.

o Sampling should be done in topological order. For example, in the example from

Lecture 9, sampling in the order 1, 2, 3, 4, 5 would be one way to sample

topologically.

o Once the parent node is sampled, use the sample from the parent node as the

observation of the parent into the child node’s probability distribution.

? Performs sampling for iterations, weight the samples and return the approximate

conditional probability (|): _get_conditional_probability()[4 points]

o Do not compute the joint proposal distribution or joint target distribution to get

(|) and (|) . You are expected to take the output these values by

taking the scalar product of the local proposal/conditional distribution values.

Note that we will also provide evidence variables, and the graph structure must be updated with

the evidence variables.

Hint: It might be useful to create additional functions for this part. Place these functions between

the designated comment blocks for each file.

6. GIBBS SAMPLING

In part 2 i.e. Gibbs sampling, you will be provided with conditional probabilities i.e. (| ?

{}) for each node . You will sample from the conditional distributions for each node, by

holding other nodes fixed. The samples are then weighted equally to obtain the approximate

distribution (|) where are the query nodes and are the evidence nodes.

Performs sampling for a single iteration (all nodes sampled once): _sample_step()[3

points]

o Sampling should be done like Lecture 9’s example on Gibbs sampling.

Performs sampling for iterations, and return the approximate distribution

_get_conditional_probability()[5 points]

o Students are expected to reduce the proposal distributions for each node to its

Markov Blanket. Students are not allowed to use networkx.markov_blanket() to

compute it.

Note that we will also provide evidence variables, and the graph structure must be updated with

the evidence variables.

Hint: It might be useful to create additional functions for this part. Place these functions between

the designated comment blocks for each file. Some points are attributable to more efficient

implementations.