Homework 7: Cycles in graphs

  • This homework is due at 11:59.59pm on Friday, April 8, 2022.
  • Huge Thank You! @adamdavisonsmith for providing framework for this assignment.

Description and Deliverables

In this assignment, you will use your understanding of the shortest path algorithms that we discussed in class to solve some programming problems related to finding cycles in a graph. I highly recommend that you start early.

To receive full credit for this project, you will turn in the following files:

  • A file named unitCycle.py that contains your implementation of a cycle-finding routine for directed graphs with unweighted edges.

  • A file named negCycle.py that contains your implementation of a negative weight cycle finding routine for directed graphs with arbitrary edge weights.

Each of these files is described in greater detail below.

You will need to program this assignment in Python for the autograder to work. We will provide several resources, and several starter files for you to begin your implementation. In particular, you can begin this assignment by downloading the following files:

Introduction

We have provided you with a python library called simplegraphs.py to get you started.

This library provides a collection of functions for manipulating graphs as python dictionaries. There are procedures for reading a graph from a file that describes it, writing the graph to a file, and other basic manipulations. It includes code for BFS and DFS and Dijkstra’s algorithm.

You can use any of these in your solution. Do not change this file in your submission (since the autograder will include the version that we distributed). You can create your own version and modify the functions there if you like.

Finding cycles with unit edge weights

The first task of the assignment is to find cycles in graphs that have unit edge weights. The BFS procedure that we discussed in class can be used to find the cycle, but an easier implementation is to use Depth First Search (or DFS). The main difference with DFS is that instead of using a queue, DFS uses a stack to store the nodes as they are visited. Since recursion naturally provides us with a stack, we can develop a simple implementation of DFS.

Take a look at the following code found in simplegraphs.py which should be the basis for your solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def DFS(G):
    # returns three dictionaries: discovered, finished, parent
    # discovered[u] records when the node was first encountered
    # finished[u] records the step when the node's exploration was done
    # parent[u] records the first node that "added" this node to the stack
    color = {}
    discovered = {}
    finished = {}
    parent = {}
    for u in G["adj"]:
        color[u] = "white"
        parent[u] = None
    timestamp = [0] #This is a list whose only element is the current value of the time stamp. 

    def DFSVisit(u,  G, timestamp, color, discovered, finished):
        # Only the first argument ever changes
        color[u] = "gray"
        timestamp[0] = timestamp[0] + 1
        discovered[u] = timestamp[0]
        for v in G["adj"][u]:
            if color[v] == "white":
                    parent[v] = u
                    DFSVisit(v,  G, timestamp, color, discovered, finished)
        color[u] = "black"
        timestamp[0] = timestamp[0] + 1
        finished[u] = timestamp[0]
        return

    for u in G["adj"]:
        if color[u] == "white":
            DFSVisit(u, G, timestamp, color, discovered, finished)
    return discovered, finished, parent

We begin by coloring each node white and settings the parent pointer to be None. We then proceed to loop over all of the nodes of the graph, and if the color is white (which means that the procedure has not visited the node yet), we run the DFSVisit procedure. This procedure colors the current node gray, and then proceeds to explore all of its neighbors. During the loop, for every node that is still white, the parent node is set to u, and a new visit procedure is initiated. After visiting all neighbors, the node’s color is changed to black to indicate it has been processed.

To illustrate how this works, you can run the DFS procedure on your own graphs using the following program:

def main(args = []):
    if len(args) < 1:
        print('Too few arguments! Usage: python3 dfs.py <filename>')
        return
    graph_file = args[0]
    G = sg.readGraph(graph_file) # Read the graph from disk
    discovered, finished, parent = sg.DFS(G)
    print(discovered)
    print(finished)
    print(parent)
    return

if __name__ == "__main__":
    main(sys.argv[1:])   

You can copy and paste this code into your own python file called dfs.py and then run:

$ python3 dfs.py 15.2.txt

using the 15.2.txt file.

Here is a depiction of the graph:

15.2 graph

and here is the same graph with the (discovered, finished) value in blue, and the parent pointers depicted as the red arrows.

15.2 graph

Thus, you can see that the DFS algorithm visited the nodes in order 0,3,2,10,6,7,5,8,14,9,4; then the algorithm finished node 4,9,14,8; and then started visiting 12,11, then finished node 11, 12, 5, 7, 6,10,2,3, and finally 0.

  1. Practice problem: To make sure that you understand how to use the code, what are the discovered, finished, and parent attributes of node 500 in graph 715.4.txt ?

Key Ideas for finding cycles

The key idea to finding a cycle (not necessarily the shortest one) is to keep track of the nodes that are currently being explored, and add a special case whenever the DFS encounters such a node. Indeed, line 17 of DFS colors the node u to gray to indicate that the node is being explored.

Within the loop that begins at line 20, line 21 includes a case for white nodes, (i.e., nodes that have not been visited yet). The key to finding cycles is to add a special case for when a gray node is encountered in this loop. Finding a gray node means that the DFS procedure has started from a node and followed a sequence of edges that has lead back to that same node.

At this point, the procedure can follow the parent pointers to create the cycle. Furthermore, one can determine an upper-bound on the length of the cycle by using the timestamps in the discovered array. For graphs that have out-degree 1 (i.e., at most one out-going edge), this upper-bound will also be the length of the cycle. (Note: this observation can help you efficiently find the shortest cycles in certain kinds of graphs that you might encounter.)

Specifically, consider making the following changes to the code (our solution takes 10 extra lines):

  • Create a new function DFSFindCycle(G) that takes graph G as an input and returns a list containing a cycle.

  • Add and initialize a new array called cycle to keep track of the shortest cycle that has been detected so far. This list should start off empty. (You can return any cycle. If you want to find shorter ones, you can try to use an optimization.)

  • Add a special case after Line 23 to check whether the color is gray.

    • If so, compute the actual cycle: start at node $u$ and continue to follow parent pointers until $v$ is encountered.
    • Since these are parent pointers, reverse this list to create the actual cycle.
  • At the end of the procedure, return the list cycle

Sample:

$ python3 unitCycle.py <graph> <output>

When run on the sample above,

$ python3 unitCycle.py 15.2.txt 15.2.txt.out
$ cat 15.2.txt.out
[12, 5]

Notes:

  1. We will be testing your code on inputs that are not yet in Gradescope. You may need to generate sample test graphs to test how well your solution works on large inputs.

  2. In a graph with multiple cycles, you can return any cycle. Our autograder will follow the nodes to ensure that they form a cycle in the graph.

What to submit

  1. You should submit a file called unitCycle.py containing your solution starter file. We will test your program by running the following command:
python3 unitCycle.py graph_file output_file

There are two arguments as before.

  • graph_file describes the graph. This uses the format that is read by simplegraphs.py. Each line lists an edge in the graph, assuming the node IDs are integers. Each line lists two vertices (ints) and a weight (float). Your program will ignore the weights.

  • output_file is where your program will write its output. Code for writing the output is provided. The only line should be a list which contains the nodes in the cycle. The node IDs should be in order, but they can start anywhere on the cycle. The autograder checks that they form a directed cycle.

Finding cycles in graphs with negative edge weights

In the next part of the assignment, your goal is to modify Bellman-Ford algorithm, so that, given a directed graph $G$ with edge costs, it will find a negative-cost directed cycle if one exists. This routine can be used to find arbitrage opportunities as we discussed in class.

  • Inputs: An directed graph $G=(V,E)$ with edge weights that may be positive, zero, or negative.
  • Outputs: A boolean (true/false) indicating whether a negative-cost cycle exists, as well as a list of nodes that form a negative cost cycle (or the emty list if none exists).

Design an algorithm for this task and program it in Python. Your algorithm should run in worst-case time $O(EV)$.

A simple strategy is to modify Bellman-Ford, as described in the Kleinberg-Tardos textbook and discussed in class. Here are some ideas you’ll have to incorporate:

  1. Recall from class, we noted that the distance estimates in the Bellman-Ford algorithm will fail to converge when there is a negative-cost cycle that is reachable from the source node. The problem is that we don’t know which source node to start from. A simple solution (also explained in KT) is to add a new ``dummy’' source node $s$ that has 0-cost edges to all other nodes, and no incoming edges. If there is a negative-cost cycle somewhere in the graph, you can find it by running Bellman-Ford from this new source. To add nodes to a graph, consider making a copy of the graph (see the sg.copyGraph(G) method in the library), and then use the sg.addDirEdge(...) method.

  2. Faster executions when cycles are short: To get full credit, your algorithm will have to run quickly on graphs where running Bellman-Ford for $V$ iterations would take too long. Specifically, when there exists a negative-cost cycle of length $\ell$, your algorithm should run in time $O(\ell E)$.

    Hint: Here is one idea for how to do this. After each iteration of Bellman-Ford, check if there is a cycle in the parent pointers that the algorithm is maintaining. If there is, then it turns out you are done (why?). What method can you use to check for this cycle?

  3. The autograder will provide feedback on running time—generally, 15 seconds per example. Of course, you should focus at first on getting an algorithm that is correct and then focus on optimizations.

  4. We will be testing your program on hidden inputs that are not yet in Gradescope. Therefore, you may want to create larger sample input graphs to test your program and ensure it runs within time. To do so, you can use the sg.randomDigraphDegreeBound(,) or other methods in the library or of your own invention.

  5. Do not use gradescope for debugging purposes. You should develop your code locally on your own machine, or in an online interpreter (e.g., Jupyter notebook). Each run of Gradescope takes several minutes, and therefore you will spend a lot of time if you try to hack changes, and resubmit in order to test your implementation. Please reach out to the course staff for help on Python if you need a tutorial.

What to submit

You should submit a file called negCycle.py containing your solution (based on the template). We will test your program by running the following command:

$ python3 negCycle.py graph_file output_file

There are two arguments as before.

  • graph_file describes the graph. This uses the format that is read by simplegraphs.py. Each line lists an edge in the graph, assuming the node IDs are integers. Each line lists two vertices (ints) and a weight (float).

  • output_file is where your program will write its output. Code for writing the output is provided. The first line is boolean (either True or False) and the second line is a list of the IDs of the nodes in the negative weight cycle cycle (an empty list if no cycle was found). The node IDs should be in order, but they can start anywhere on the cycle. The autograder checks that they form a directed cycle of the correct weight.

Submit Your Project

To submit your project, follow these steps:

  1. Produce your files

    • unitCycle.py
    • negCycle.py
  2. Go to gradescope and submit your assignment.

  • Select the appropriate assignment and then choose “Upload Assignment” and drag and drop these two files.

  • You can submit multiple times before the deadline. Your last submission will determine your grade.

  • After the due date, we will upload extra test cases and run the autograder script. You will be able to see your grade and assignment feedback on Gradescope. Grades will also be synched with Canvas. The point of this is to encourage you to think analytically about your solution to ensure that it covers all of the cases. This is practice for the real world.

Grading

We will include 10-20 hidden test cases, and each one will be worth between 5-15 points.