- COMP.CS.140
- 4. Programming in the Large
- 4.1 Playing Field of Implementing Software
- 4.1.1 Concept of Abstraction
Concept of Abstraction¶
The area of implementing large software became known from the so called software crisis where the increase in the computational capabilities of computing equipment made implementing increasingly comples software possible. This lead to software projects not being able to stick to their schedules, the implementet software for inefficient and overly complex and thus even impossible to maintain. Several solutions to the crisis have been attempted both with software processes and tools and through programming language development. It is still worth to note that software projects are still susceptible to the same problems as when the term was coined in the 70s. Edsger Dijkstra has in his Turing award speach in 1972 said: “as long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a mild problem, and now we have gigantic computers, programming had become an equally gigantic problem.” Major software products form an increasing job market. Let’s first focus on one key element in making their implementation possible: utilizing abstraction.
Abstraction¶
The human ability to handle data is at its heart quite limited. Our working memory can at the same time handle quite small amount of information (famously our capacity is depicted to being 7+/-1 units. However the information processed impacts the situations. Here it is enough to note that we cannot maintain very large information sets in out working memory at once.) We have thus developed a rather refined tool set for understanding and handling complex topics: abstraction. If you look around you, there are abstractions everywhere. Maps don’t show all details but are based on hiding details not important to the user of the map. You do not need to know how the engine, transmission and brakes in your car work in order to drive it. You just need to use the control equipment. In languages we create new concepts and terms to be able to describe things, events and groups. In each abstraction is utilized.
Abstraction
Simplifying or generalizing a concept in order to make handling it possible. The purposeful hiding of details and ignoring the, in order to highlight the important concepts, details or structures.
In abstraction we collect together things that belong together in some way and use a common name for the concept created. Programming is based on the use of abstraction in many ways and utilizing abstractions in managing the problem as a whole. We divide the problem into pieces that are small enough for one person to manage and simplify information handling through the use of abstractions. We call this purposeful omission of details in order to create an abstraction as information hiding.
Levels of Abstraction¶
In programming abstractions can be used on different levels of program code structure. The goal is to approach a complec problem through creating structure to it so that we can build a solution to the problem as a whole. The starting point is the good old divide and conquer i.e. the problem is divided into smaller pieces until a suitable level of detail is reached. Each piece can further be divided into its parts. On the other hand we can approach to topic through specilization: A book is a concept that depicts pages tied together. A poem collection contains poems readable on these pages. The approaches to abstraction i.e. relationships “is-a” (a poem collection is a book) and “has-a” (a library has poem collections) are both essential from the point of view of programming. The use of different abstractions can be approached through a history walkthrough: programming languages have evolved into providing stronger abstractions when creating implementations. Such abstractions are data structures, modules, objects and components.
In the early days of computers the computer machinery dictated what kind of programs were programmably solvable. The first step towards any abstraction was the development of a method to write program code in a human approachable way – the assembly language.
The code still had no stucture to support reading and maintaining it. This kept writing any program even a little larger difficult to control. It is very difficult to see afterwards what data the program handles and from where the data is handled. The structure of assembly code is often depicted as spaghetti and still today code that is confusing and difficult to follow is name as spaghetti code.
Division into parts¶
The development of subprograms, functions and data structures made it possible to handle things through abstractions both on the level of data and functionality. Data structures collect information that hierarchically belongs together under a common name. Functionality that handles the data can in turn be collected under a named piece of the program that implements a single functionality, i.e. a function or a subprogram.
Encapsulation of Data¶
Collecting data and functionality together does not stop handling the data freely from wherever in the program, which can lead to hard to find errors and difficulties if the functionality needs to be changed. As the structure of the program gets more complex references to the data structures need to be defined hierarchically as well. Information hiding can be utilized to do exactly this: define the functionality handling the data structure and collect it together with the data structure into a module. Modules then offer an interface for handling the data that consists of a collection of interface functions. The interface hides the implementation of the data structure behind the interface functions of the module. The module hence looks different from the point of view of the user of the module and the implementor of its functionality i.e. the abstraction level of these is different from each other. The user is a programmer who needs some service offered by the module. The interface functions found in the public interface of the module are interesting to the user – if they provide the functionality needed and how they should be called. The implementation itself is of no meaning to the user. What is important is that the functionality the user needs is found in the interface. The implementor of the module needs to implement the functiontality of the module. Their responsibility is also to make sure everything the user needs is available in the public interface of the module. Hiding the information irrelevant to the user behind a defined interface is called encapsulation. Even though the interface approach makes designing the program more complex, otherwise it brings significant benefits. The implementation can be changed without the change having an impact on the user as long as the interface remains unchanged. In addition the hidden data is not handled all around the programn which makes searching for error and maintaining the code easier.
Abstract Data Types¶
The next step in utilizing abstractions is to collect the data as part of the functionality through the object oriented approach. An abstract data type collects the data and implementation tightly together and enables using sevel elements of the same type – objects – as a part of the program. Conceptually an abstract data type is a unit defined by the functionality of the interface and the internal implementation of the datatype. Objects are the concrete data elements created from the abstract data type in the program that behave according to the functionality of the interface. Where a module offers an interface and information hiding, abstract data types combine the data structure and the interface. Modules and objects also complement eachother. Through modularity a program can be divided into components on a higher level and objects can be used in their implementation. A division into the static and dynamic part of the program is created. The interfaces defined by the modules are the unchanged – static – part of the program. Objects then as created as needed during the execution of the program. For example, there can be several different dates at the same time in a program. They all have the same public funtionality but an internal state that differs from eachother. An object is also an independent unit in the program. We think that each abstract data type has its own responsibility.
Service Oriented View¶
One important change brought by the use of abstraction is the shift from a pure implementer view to services offered to the user. Interfaces, objects and their implementations form a software component. A component is an independent piece of the software that can be used as a part of a bigger unit. Java as a programming language is a good example of this thinking.
JavaBeans is one examples of utilizing the component approach in a programming language.
Principle of Locality¶
On this course the focus is on learning the utilization of modular program structure in your own programs and more generally in the everyday development of a larger programs. The course does not thus handle software design. For that there is a course COMP.SE.110 Software Design. In practice even a largish program cannot be implemented without designing it so some basic design principles are covered.
The aim of software design is to find a solution to a problem. This means from the point of view of modularity that suitable components and their connections are recognized. A simplified view to software design consits of:
recognition
definition of responsibilities
indentifying the connections between components
defining interfaces
The division into modules can be done either with top down or bottom up approach. In top down approach the largest functional parts of the program such as the user interface, the database, data handling and possible connections to other programs are identified first. After this each part can in turn be divided into smaller units until finally pieces that can be implemented as modules, abstract data types and components are found. In bottom up approach there already is known solutions to some subproblems available. The implementation of the software can be started by combining these together. The design can and often is a combination of these approaches.
The different parts of the program communicate with eachother and refer to eachother. A reference between modules means that one module nees a service provided by another. The aim is to minimize the connections between different software components. Each reference between components adds to the complexity of the software and thus makes it more difficult to maintain and understand – abstractions are used for in softwre design to avoid these things. When the connections between components is minimized the structure of the program becomes more simple and this further gets simpler if the connections are kept onedirectional. If a group of strongly connected components referring to eachother a lot is identified they can be collected behind a new common interface and thus maintain so called locality in the program.
Principle of Locality
Collecting and packaging modules strongly connected to each other behind a new and simplified interface in order to minimize connections and to keep the complexity of the program maintainable.
Paying attention to the direction of the dependencies also simplifies the program structure: it is worth to keep the dependencies onedirectional.
Also possible cyclic references make the implementation and testing of the components more difficult.
Despite careful design situations where a two directional dependecy between components is unavoidable.
Programming languages are prepared in one way or another to these situations. For example in C++ a situation where class A need a service of class B and vice versa B a service of A, is solved by using a forward declaration class A;
.
Java in turn recognizes classes and methods from the source code files and hence tyoes and methods can be used without any forward declaration.
On abstractions (duration 16:43)