Introduction to Sets & Maps
Authors: Darren Yao, Benjamin Qi, Allen Li, DRGSH
Prerequisites
Sorting, and maintaining collections of distinct elements with ordered sets.
Resources | |||
---|---|---|---|
IUSACO | module is based off this | ||
CPH | covers similar material |
Both Java and C++ contain two versions of sets and maps; one using sorting and the other using hashing. We'll only introduce the former version in this module.
Sets
Focus Problem – read through this problem before continuing!
A set is a collection of objects that contains no duplicates. In ordered sets, the entries are sorted in order of key. Insertions, deletions, and searches are all , where is the number of elements in the set.
C++
The operations on a C++ set
include:
insert
, which adds an element to the set if not already present.erase
, which deletes an element if it exists.count
, which returns1
if the set contains the element and0
if it doesn't.
1set<int> s;2s.insert(1); // [1]3s.insert(4); // [1, 4]4s.insert(2); // [1, 2, 4]5s.insert(1); // [1, 2, 4]6// the add method did nothing because 1 was already in the set7cout << s.count(1) << endl; // 18s.erase(1); // [2, 4]9cout << s.count(5) << endl; // 010s.erase(0); // [2, 4]
Java
The operations on a TreeSet
are add
, which adds an element to the set if not already present, remove
, which deletes an element if it exists, and contains
, which checks whether the set contains that element.
1Set<Integer> set = new TreeSet<Integer>();2set.add(1); // [1]3set.add(4); // [1, 4]4set.add(2); // [1, 2, 4]5set.add(1); // [1, 2, 4]6// the add method did nothing because 1 was already in the set7System.out.println(set.contains(1)); // true8set.remove(1); // [2, 4]9System.out.println(set.contains(5)); // false10set.remove(0); // [2, 4]
Python
Warning!
Ordered sets and maps are not built into Python. The Python OrderedDict stores keys in the same order as they were inserted in, not in sorted order.
The built in python unsorted set supports:
add()
: Adds element to setremove()
: Removes element from setx in set
: Checks if element x is in the set
1set = set()2set.add(1) # {1}3set.add(4) # {1, 4}4set.add(2) # {1, 4, 2}5set.add(1) # {1, 4, 2}6# the add method did nothing because 1 was already in the set7print(1 in set) # True8set.remove(1) # {4, 2}9print(5 in set) # False10set.remove(0); # {4, 2}
Solution - Distinct Numbers
This problem asks us to calculate the number of distinct values in a given list.
Method 1 - Set
This is probably the easier of the two methods, but requires knowledge of sets. Because sets only store one copy of each value, we can insert all the numbers into a set, and then print out the size of the set.
C++
1#include <bits/stdc++.h>23using namespace std;45int main() {67 int n;8 cin >> n;910 set<int> distinctNumbers;
Java
1// Source: Daniel23import java.io.BufferedReader;4import java.io.IOException;5import java.io.InputStreamReader;6import java.util.HashSet;7import java.util.StringTokenizer;89public class DistinctNumbers {10
Python
1n = int(input()) # unused2nums = [int(x) for x in input().split()]3distinct_nums = set(nums)4print(len(distinct_nums))
We can do this more efficiently by skipping the creation of the list, and use a set comprehension directly:
1n = int(input()) # unused2distinct_nums = {int(x) for x in input().split()}3print(len(distinct_nums))
Method 2 - Sorting the Array
This problem is also solvable without sets. Sort a vector containing all the numbers. The answer is the number of times two adjacent numbers are not equal; more formally, the number of times , plus one.
C++
1#include <bits/stdc++.h>23using namespace std;45int main() {67 int n;8 cin >> n;910 int A[n];
Java
1import java.io.*;2import java.util.*;34public class DistinctNumbers {56 public static void main(String[] args) throws IOException {7 BufferedReader br = new BufferedReader(new InputStreamReader(System.in));8 int n = Integer.parseInt(br.readLine());9 StringTokenizer st = new StringTokenizer(br.readLine());10 int[] arr = new int[n];
Python
1n = int(input())2arr = sorted([int(x) for x in input().split()])3print(1+sum(a != b for a, b in zip(arr, arr[1:]))
Maps
Focus Problem – read through this problem before continuing!
A map is a set of ordered pairs, each containing a key and a value. In a map, all keys are required to be unique, but values can be repeated. Maps have three primary methods:
- one to add a specified key-value pairing
- one to retrieve the value for a given key
- one to remove a key-value pairing from the map
Insertions, deletions, and searches are all , where is the number of elements in the map.
C++
In a C++ map
m
:
- the
m[key] = value
operator assigns a value to a key and places the key and value pair into the map. The operatorm[key]
returns the value associated with the key. If the key is not present in the map, thenm[key]
is set to 0. - The
count(key)
method returns the number of times the key is in the map (which is either one or zero), and therefore checks whether a key exists in the map. - Lastly,
erase(key)
removes the map entry associated with the specified key.
1map<int, int> m;2m[1] = 5; // [(1, 5)]3m[3] = 14; // [(1, 5); (3, 14)]4m[2] = 7; // [(1, 5); (2, 7); (3, 14)]5m[0] = -1; // [(0, -1); (1, 5); (2, 7); (3, 14)]6m.erase(2); // [(0, -1); (1, 5); (3, 14)]7cout << m[1] << '\n'; // 58cout << m.count(7) << '\n' ; // 09cout << m.count(1) << '\n' ; // 1
Java
In a TreeMap
, the put(key, value)
method assigns a value to a key and places the key and value pair into the map. The get(key)
method returns the value associated with the key. The containsKey(key)
method checks whether a key exists in the map. Lastly, remove(key)
removes the map entry associated with the specified key. All of these operations are , but again, due to the hashing, this has a high constant factor.
1Map<Integer, Integer> map = new TreeMap<Integer, Integer>();2map.put(1, 5); // [(1, 5)]3map.put(3, 14); // [(1, 5); (3, 14)]4map.put(2, 7); // [(1, 5); (2, 7); (3, 14)]5map.remove(2); // [(1, 5); (3, 14)]6System.out.println(map.get(1)); // 57System.out.println(map.containsKey(7)); // false8System.out.println(map.containsKey(1)); // true
Python
Colloquially, maps are referred to as dicts in python. They act as hash maps, so they actually have insertion, deletion, and searches.
1d = {}2d[1] = 5 # {1: 5}3d[3] = 14 # {1: 5, 3: 14}4d[2] = 7 # {1: 5, 2: 7, 3: 14}5del d[2] # {1: 5, 3: 14}6print(d[1]) # 57print(7 in d) # False8print(1 in d) # True
Iterating Over Maps
C++
To iterate over maps, you can use a for
loop.
1for (pair<int,int> x : m) {2 cout << x.first << " " << x.second << '\n';3}4for (auto x : m) {5 cout << x.first << " " << x.second << '\n';6}78/* both output the following:90 -1101 5
The map stores pairs in the form {key, value}
. The auto
keyword suffices to iterate over any type of pair. You can use these pairs normally, as introduced in this module.
Additionally, you can pass by reference when iterating over a map, like this:
1for (auto& x : m) {2 x.second = 3;3}4for (pair<int,int> x : m) {5 cout << x.first << " " << x.second << '\n';6}78/*90 3101 3
This allows you to modify the values of the pairs stored in the map.
Java
To iterate over maps, you can use a for
loop over the keys.
1for (int k : m.keySet()){2 System.out.println(k + " " + m.get(k));3}
Python
To iterate over dict
s, there are three options.
Dicts will be returned in the same order of insertion in Python 3.6+.
You can iterate over the keys:
1for key in d:2 print(key)
You can iterate over the values:
1for value in d.values():2 print(value)
You can iterate over the key-value pairs:
1for key, value in d.items():2 print(key, value)
Inserting / Deleting Keys While Iterating
While you are free to change the values in a map when iterating over it (as demonstrated above), be careful about inserting and deleting keys while iterating.
Python
This code will give a runtime error (although similar code will create a map
with 11 entries in C++):
1def iterate_insert():2 d = {0:0}34 for key in d:5 if key == 10:6 break7 d[key] = 58 d[key+1] = 0910 print("ENTRIES:")
Traceback (most recent call last): File "test.py", line 17, in <module> iterate_insert() File "test.py", line 7, in iterate_insert for key in d: RuntimeError: dictionary changed size during iteration
If you want to remove every third entry from a map, one way is to just create a new map.
1d = {i: i for i in range(10)}2d_new = dict(item for i, item in enumerate(d.items()) if i % 3 != 2)34print("new dict:", d_new)5# new dict: {0: 0, 1: 1, 3: 3, 4: 4, 6: 6, 7: 7, 9: 9}
Another is to maintain a list of all the keys you want to delete and remove them after the iteration finishes:
1d = {i: i for i in range(10)}2to_remove = {key for i, key in enumerate(d) if i % 3 == 2}34for key in to_remove:5 del d[key]6print("new dict:", d)7# new dict: {0: 0, 1: 1, 3: 3, 4: 4, 6: 6, 7: 7, 9: 9}
C++
This code will work (adding keys while iterating over a map):
1void iterate_insert() {2 map<int,int> m; m[0] = 0; //starts with a single key3 for (auto& p: m) { //adds keys in the loop until the key 104 if (p.f == 10) break;5 p.s = 5;6 m[p.f+1] = 0;7 }8 cout << "ENTRIES:\n";9 for (pair<int,int> p: m)10 cout << p.f << " " << p.s << "\n";
However, consider the following code, which attempts to remove every third entry from a map.
1void iterate_remove_bad() {2 map<int,int> m; for (int i = 0; i < 10; ++i) m[i] = i;3 int cnt = 0;4 for (auto it = begin(m); it != end(m); ++it) {5 cout << "CURRENT KEY: " << it->f << "\n";6 cnt ++;7 if (cnt%3 == 0) m.erase(it);8 }9 cout << "REMAINING ENTRIES:\n";10 for (pair<int,int> p: m)
However, we would expect the keys 2, 5, and 8 to be removed from the map, but this is not the case. 2 is correctly removed, but the next key removed is 4, not 5! And it seems that some keys are appearing more than once during the iteration.
As the documentation for erase
mentions, "iterators, pointers and references referring to elements removed by the function are invalidated." So incrementing it
after it
has been erased from the map might not produce the intended result. If you're lucky, this will produce a segmentation fault. Unfortunately, sometimes (as in this case) the code will run without appearing to produce an error.
If we compile using -D_GLIBCXX_DEBUG
and run the above, then
g++ -D_GLIBCXX_DEBUG whoops.cpp -o whoops && ./whoops
gives an error, as expected.
CURRENT KEY: 0 CURRENT KEY: 1 CURRENT KEY: 2 /usr/local/Cellar/gcc/10.1.0/include/c++/10.1.0/debug/safe_iterator.h:328: In function: __gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>& __gnu_debug::_Safe_iterator<_Iterator, _Sequence, _Category>::operator++() [with _Iterator = std::_Rb_tree_iterator<std::pair<const int, int> >; _Sequence = std::__debug::map<int, int>; _Category = std::forward_iterator_tag] Error: attempt to increment a singular iterator. Objects involved in the operation: iterator "this" @ 0x0x7ffee963c870 { type = std::_Rb_tree_iterator<std::pair<int const, int> > (mutable iterator); state = singular; references sequence with type 'std::__debug::map<int, int, std::less<int>, std::allocator<std::pair<int const, int> > >' @ 0x0x7ffee963c8b0 } zsh: abort ./whoops
Similarly, in Java,
If the map is modified while an iteration over the collection is in progress (except through the iterator's own remove operation), the results of the iteration are undefined.
As suggested by this StackOverflow post, the following code produces the intended results.
1void iterate_remove_ok() {2 map<int,int> m; for (int i = 0; i < 10; ++i) m[i] = i;3 int cnt = 0;4 for (auto it = begin(m), next_it = it; it != end(m); it = next_it) {5 ++next_it;6 cout << "CURRENT KEY: " << it->f << "\n";7 ++cnt;8 if (cnt%3 == 0) {9 m.erase(it);10 }
You could also just create a new map instead of removing from the old one.
1void iterate_remove_ok_2() {2 map<int,int> m, M; for (int i = 0; i < 10; ++i) m[i] = i;3 int cnt = 0;4 for (pair<int,int> p: m) {5 ++cnt;6 if (cnt%3 != 0) M[p.f] = p.s;7 }8 swap(m,M);9 cout << "REMAINING ENTRIES:\n";10 for (pair<int,int> p: m)
Java
Modifying a Collection (Set
, Map
, etc.) in the middle of a for-each loop is unlikely to work, as it will probably cause a ConcurrentModificationException. See the following snippet for an example:
1void iterate_remove_set_BAD() {2 Set<Integer> s = new TreeSet<Integer>();3 s.add(0); s.add(1); s.add(2);4 for(Integer a : s) {5 s.remove(a); // ConcurrentModificationException thrown!!6 }7}
One work-around is to use Iterator
and the .remove()
method to remove elements while looping over them, like in the next code snippet:
1void iterate_remove_set() {2 Set<Integer> s = new TreeSet<Integer>();3 //s starts as {0, 1, 2}4 s.add(0); s.add(1); s.add(2);56 Iterator<Integer> iter = s.iterator();7 while(iter.hasNext()) {8 int key = iter.next();9 if(key == 0 || key == 2)10 iter.remove();
However, Iterator
is not commonly seen in Java, so the best option (in most cases) if you want to remove/insert mutiple elements at once is to use your Container's .addAll(c)
or .removeAll(c)
methods. That means that you should put all the elements you want to remove (or add) in a new Collection, and then use that new Collection as the parameter of the .addAll(c)
or .removeAll(c)
method that you call on your original Collection. See the following code snippet for an example (it works equivalently to the code above):
1void iterate_remove_set_good() {2 Set<Integer> s = new TreeSet<Integer>();3 //s starts as {0, 1, 2}4 s.add(0); s.add(1); s.add(2);56 Set<Integer> toRemove = new TreeSet<Integer>();7 for(Integer a : s) {8 if(a == 0 || a == 2) toRemove.add(a);9 }10
Problems
Some of these problems can be solved by sorting alone, though sets or maps could make their implementation easier.
Status | Source | Problem Name | Difficulty | Tags | Solution | URL |
---|---|---|---|---|---|---|
CSES | Easy | |||||
Bronze | Easy | External Sol | ||||
Bronze | Normal | |||||
Silver | Normal |
Module Progress:
Join the USACO Forum!
Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!