- COMP.CS.140
- 3. Java as Programming Language
- 3.1 More basic features of Java
- 3.1.2 Basic Java containers
Basic Java containers¶
The Java standard class library offers similar container classes and sorting functions as e.g. C++ and Python libraries. We will review some of them in this section. We will not even try to go through all the details. You are recommended to also look at the Java Collections framework documentation for more information. Java Collections framework is a part of the Java class library dedicated to storing and manipulating collections.
It is good to note one important thing right in the beginning: Java containers can store only objects. If you wish to store primitive values as such, you are pretty much limited to using plain arrays. Java containers are commonly used with primitive values: the actually stored values will then just need to be boxed into wrapper class objects that can be later unboxed when read into primitive variables. This is convenient enough from a programmer’s point of view due to how Java performan the (un)boxing automatically.
The generic class java.util.ArrayList
provides a similar dynamic array as e.g. vector
in
C++ or list
in Python. We will discuss generics a little later, but the term refers to the
fact that we give an item type as a type parameter when using the ArrayList
class. By a dynamic
array we mean an array whose size can be changed e.g. by adding items to its end.
A list?
Not that the term “list” is not confined to mean just e.g. a linked list: a list is in general
a collections of items that have a linear order, that is, where we can unambiguously refer to
e.g. the first or the last item, or in general to the item at index i
. As the name suggests,
ArrayList
is a list that stores its items in an array (and is reminiscent of Python’s
list
). We do not discuss it further in this section, but Java also contains e.g. the class
LinkedList
which, as the name suggests, is a list that stores its items in a linked list.
Java’s generic classes use similar syntax as C++: the type parameter is given inside angle brackets
after the class name. E.g. ArrayList<Integer>
is an ArrayList
array for storing Integer
objects, and ArrayList<ArrayList<String>>
is an ArrayList
array whose items are
ArrayList
arrays for storing String
objects (so it would in effect be a two dimensional
String
array). It is also possible to store plain arrays into an ArrayList
array: e.g.
also ArrayList<String[]>
would be a two dimensional String
array (although less dynamic
than the preceding one). The ArrayList
class contains e.g. the following member functions:
add(item)
: additem
to the end of this array (and the array grows by one).addAll(container)
: add all items ofcontainer
to the end of this array (as if they were added individually usingadd
).get(i)
: return the item at indexi
.set(i, item)
: storeitem
to the (existing) indexi
.
The following example code stores the 10 first Fibonacci numbers into an ArrayList
array.
// Note how the angle brackets on the right side are empty. Java allows to leave out the
// type parameters when using the new operator if the Java compiler can infer them from
// the target variable type (here the type of fibonacci, that is, ArrayList<Integer>).
// It would of course be legal to use the full form new ArrayList<Integer>().
// The array will be initially empty.
ArrayList<Integer> fibonacci = new ArrayList<>();
// Add the first 2 Fibonacci numbers to the beginning of the array.
fibonacci.add(0); // The first Fibonacci number is 0.
fibonacci.add(1); // The second Fibonacci number is 1.
for(int i = 1; i < 10; i++) {
// Add the rest: the next Fibonacci number is always the sum of the previous two.
fibonacci.add(fibonacci.get(i-1) + fibonacci.get(i));
}
// Prints out "[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]".
System.out.println(fibonacci);
The generic classes HashSet
and TreeSet
are set classes similar to unordered_set
and set
in C++ and set
in Python. They store a collection of items without duplicates:
no two items in a set are similar (compare to be equal). Set classes are much more efficient that
arrays especially in situations where we want to check whether a set contains a certain item.
The two Java set classes have the same difference as the corresponding C++ classes: HashSet
is based on a hash table and TreeSet
on a balanced search tree. In terms of practice, one
sometimes relevant consequence is that HashSet
maintains its items in random order (due to how
hashing works) whereas TreeSet
maintains its items in ascending order.
Both HashSet
and TreeSet
offer e.g. the following member functions:
add(item)
: additem
to this set unless the set already contains a similar item.addAll(container)
: add all items ofcontainer
to this set (as if they were added individually usingadd
).contains(item)
: returns aboolean
that tells whether this set contains an item similar toitem
.remove(item)
: removesitem
from this set (if the set contains an item similar toitem
).
TreeSet
in addition has e.g. the following member functions that rely on item order:
first()
: returns the smallest item of this set.last()
: returns the largest item of this set.pollFirst()
: returns and removes the smallets item of this set.pollLast()
: returns and removes the largest item of this set.
The next code example demonstrates basic use of set containers. Only some of the above mentioned functions are used here.
import java.util.HashSet;
import java.util.TreeSet;
class Sets {
public static void main(String[] args) {
HashSet<String> hashSet = new HashSet<>();
TreeSet<String> treeSet = new TreeSet<>();
for(String arg : args) {
hashSet.add(arg);
treeSet.add(arg);
}
System.out.print("Iterating over hashSet:");
for(String s : hashSet) {
System.out.print(" " + s);
}
System.out.println();
System.out.print("Iterating over treeSet:");
for(String s : treeSet) {
System.out.print(" " + s);
}
System.out.println();
}
}
Running the code example e.g. as java Sets one two one three one four five two
might print out:
Iterating over hashSet: four one two three five
Iterating over treeSet: five four one three two
The result illustrates how iterating over the hash-based set lists the items in a more or less random order, but iterating over the tree-based set listes them in ascending alphabetical order.
The generic classes HashMap
and TreeMap
are dictionary classes similar to unordered_map
and map
in C++ and dict
in Python. They store key-value pairs and offer efficient lookup
function based on the key of the key-value pairs. The key and value types can be different and
are given as two comma separated type parameters when using a dictionary class. E.g.
HashMap<String, Double>
would store String
keys and Double
values, and
TreeMap<Integer, ArrayList<String>>
would store Integer
keys and ArrayList<String>
values. In the latter case the values would thus be dynamic arrays, and this kind of a dictionary
structure might be useful e.g. if we wish to store several values under the same key.
The dictionary classs have the same difference as the corresponding set classes: HashMap
is
based on a hash table and maintains the key-value pairs in random order, whereas TreeMap
is
based on a balanced tree and maintains the key-value pairs in ascendingorder with respect to the
keys. Both HashMap
and TreeMap
offer e.g. the following member functions:
put(key, item)
: adds thekey
-value
pair, that isitem
underkey
, into this dictionary. If the key already had a value stored under it,item
will replace the previous value.putAll(map)
: adds all key-value pairs of the dictionarymap
into this dictionary (as if they were added individually usingput
).containsKey(key)
: returns aboolean
that tells whether this dictionary contains a value stored underkey
.containsValue(item)
: returns aboolean
that tells whether this dictionary contains a valueitem
under any key. Note: this may be slow as it probably scans key-value pairs one by one.get(key)
: return the value stored underkey
(ornull
if the dictionary does not contain a value stored underkey
).remove(key)
: remove the key-value pair whose key iskey
(if such exists).keySet()
: returns a set that contains the keys of all stored key-value pairs.entrySet()
: returns a set that contains all stored key-value pairs.The key-value pairs are stored internally in objects of type
Map.Entry<K, V>
, whereK
andV
are the key and value types. This is similar topair
in C++.Map.Entry
has the member functionsgetKey()
andgetValue()
for extracting the key and value stored in it.
TreeMap
also offers e.g. the following functions that rely on the order of the key-value pairs:
firstEntry()
: returns aMap.Entry
containing the key-value pair with the smallest key.lastEntry()
: returns aMap.Entry
containing the key-value pair with the slargest key.pollFirstEntry()
: returns and removes aMap.Entry
containing the key-value pair with the smallest key.pollLastEntry()
: returns and removes aMap.Entry
containing the key-value pair with the largest key.
The following example code demonstrates basic use of dictionaries by “indexing” the command line parameters based on their first characters: all command line parameters sharing the same first character are stored into a list that is stored using that first character as the key. We again use only some of the functions mentioned above.
import java.util.TreeMap;
import java.util.TreeSet;
class Index {
public static void main(String[] args) {
TreeMap<Character, TreeSet<String>> index = new TreeMap<>();
for(String arg : args) {
// Use the first character as a key, converting it always to upper case.
Character key = Character.toUpperCase(arg.charAt(0));
// Lookup the word set corresponding to this key. If it does not exist yet,
// (get returned null), create and store a new empty set under this key.
TreeSet<String> keyValues = index.get(key);
if(keyValues == null) {
keyValues = new TreeSet<>();
index.put(key, keyValues);
}
// Add the current word to the set that now must exist in the dictionary.
keyValues.add(arg);
}
// Iterate over the dictionary contents with a nested loop where the outer loop
// iterates over the keys and the inner loop iterates over the words stored in
// the set under that key.
System.out.println("Iterating over keys and their values:");
for(Character key : index.keySet()) {
System.out.print(" " + key + ":");
for(String s : index.get(key)) {
System.out.print(" " + s);
}
System.out.println();
}
// Iterate over the dictionary again, but this time with a simple loop that
// iterates over the key-value pairs. Here we might consider using the inferred
// var type as the type of the iterated key-value pairs is fairly complicated
// and long: Map.Entry<Character, TreeSet<String>>
System.out.println("Iterating over key-value-pairs:");
for(var keyValue : index.entrySet()) {
// We may extract the key and value from the key-value pair by calling getKey()
// and getValue(), respectively. We could also have simply printed the key-value
// pairs as such, because Map.Entry offers a reasonable toString implementation.
System.out.print(" " + keyValue.getKey() + ": ");
System.out.println(keyValue.getValue());
}
}
}
Running the example code as
java Index The Java programming language is related to C and C++ but is
organized rather differently with a number of aspects of C and C++ omitted
and a few ideas from other languages included
will print:
Iterating over keys and their values:
A: a and aspects
B: but
C: C C++
D: differently
F: few from
I: ideas included is
J: Java
L: language languages
N: number
O: of omitted organized other
P: programming
R: rather related
T: The to
W: with
Iterating over key-value-pairs:
A: [a, and, aspects]
B: [but]
C: [C, C++]
D: [differently]
F: [few, from]
I: [ideas, included, is]
J: [Java]
L: [language, languages]
N: [number]
O: [of, omitted, organized, other]
P: [programming]
R: [rather, related]
T: [The, to]
W: [with]
The preceding examples created the containers to be initially empty, without giving a constructor
parameter. The container classes usually have also a constructor that takes an existing compatible
container object as a parameter and initializes the new container with its contents (as if we had
used addAll
or putAll
right after constructing an initially empty container).
Container vs. hash code / comparison function
Containers based on a hash table compute hash values to the items (or dictionary keys) by calling
a member function hashCode()
. All Java objects have such a function, at least as a default
implementation inherited from Object
class (this is discussed later). The default
implementation is based on object identity only (and not the value represented by an object).
Most fundamental Java classes, such as Integer
, String
and so on, have a reasonable own
implementation of hashCode
. If you define a class that you want to use with hash based
containers, you should implement the hashCode
function for that class. This can be even quite
straight-forward to do, because a hash value can usually be computed by combining the hash values
of some suitably selected object member variables. If the member variable types already have
reasonable hashCode
implementations, a combined has value can be computed by the function
java.util.Objects.hash
. This function takes one or more objects as parameters and returns a
hash value based on combining their individual hash values. We will provide an example of this
later when discussing class inheritance.
Tree based containers maintain items (or dictionary keys) in ascending order. This requires that
the items must either readily support value comparison or that we provide a separate comparison
function as a constructor parameter when creating the container. This will be discussed later.
But note that Java’s fundamental value types (Integer
, String
, etc.) readily support
comparison based on “typical” ascending order.