Monday, October 22, 2018

Understanding Istio Ingress Gateway in Kubernetes

Traditionally, Kubernetes has used an Ingress controller to handle the traffic that enters the cluster from the outside. When using Istio, this is no longer the case. Istio has replaced the familiar Ingress resource with new Gateway and VirtualServices resources. They work in tandem to route the traffic into the mesh. Inside the mesh there is no need for Gateways since the services can access each other by a cluster local service name.

So how does it work? How does a request reach the application it wants? It is more complicated than one would think. Here is a drawing and a quick overview.

  1. A client makes a request on a specific port.
  2. The Load Balancer listens on this port and forwards the request to one of the workers in the cluster (on the same or a new port).
  3. Inside the cluster the request is routed to the Istio IngressGateway Service which is listening on the port the load balancer forwards to.
  4. The Service forwards the request (on the same or a new port) to an Istio IngressGateway Pod (managed by a Deployment).
  5. The IngressGateway Pod is configured by a Gateway (!) and a VirtualService.
  6. The Gateway configures the ports, protocol, and certificates.
  7. The VirtualService configures routing information to find the correct Service
  8. The Istio IngressGateway Pod routes the request to the application Service.
  9. And finally, the application Service routes the request to an application Pod (managed by a deployment).

Routing a Request through Istio Gateway to an Application

The Load Balancer

The load balancer can be configured manually or automatically through the service type: LoadBalancer. In this case, since not all clouds support automatic configuration, I'm assuming that the load balancer is configured manually to forward traffic to a port that the IngressGateway Service is listening on. Manual load balancers don't communicate with the cluster to find out where the backing pods are running, and we must expose the Service with type: NodePort and they are only available on high ports, 30000-32767. Our LB is listening on the following ports.

  • HTTP - Port 80, forwards traffic to port 30080.
  • HTTPS - Port 443, forwards traffic to port 30443.
  • MySQL - Port 3306, forwards traffic to port 30306.

Make sure your load balancer configuration forwards to all your worker nodes. This will ensure that the traffic gets forwarded even if some nodes are down.

The IngressGateway Service

The IngressGateway Service must listen to all the above ports to be able to forward the traffic to the IngressGateway pods. We use the routing to bring the port numbers back to their default numbers.

Please note that a Kubernetes Service is not a "real" service, but, since we are using type: NodePort, the request will be handled by the kube-proxy provided by Kubernetes and forwarded to a node with a running pod. Once on the node, an IP-tables configuration will forward the request to the appropriate pod.

# From the istio-ingressgateway service
  ports:
  - name: http2
    nodePort: 30000
    port: 80
    protocol: TCP
  - name: https
    nodePort: 30443
    port: 443
    protocol: TCP
  - name: mysql
    nodePort: 30306
    port: 3306
    protocol: TCP

If you inspect the service, you will see that it defines more ports than I have describe above. These ports are used for internal Istio communication.

The IngressGateway Deployment

Now we have reached the most interesting part in this flow, the IngressGateway. This is a fancy wrapper around the Envoy proxy and it is configured in the same way as the sidecars used inside the service mesh (it is actually the same container). When we create or change a Gateway or VirtualService, the changes are detected by the Istio Pilot controller which converts this information to an Envoy configuration and sends it to the relevant proxies, including the Envoy inside the IngressGateway.

Don't confuse the IngressGateway with the Gateway resource. The Gateway resource is used to configure the IngressGateway

Since container ports don't have to be declared in Kubernetes pods or deployments, we don't have to declare the ports in the IngressGateway Deployment. But, if you look inside the deployment you can see that there are a number of ports declared anyway (unnecessarily).

What we do have to care about in the IngressGateway Deployment is SSL certificates. To be able to access the certificates inside the Gateway resources, make sure that you have mounted the certificates properly.

# Example certificate volume mounts
volumeMounts:
- mountPath: /etc/istio/ingressgateway-certs
  name: ingressgateway-certs
  readOnly: true
- mountPath: /etc/istio/ingressgateway-ca-certs
  name: ingressgateway-ca-certs
  readOnly: true

# Example certificate volumes
volumes:
- name: ingressgateway-certs
  secret:
    defaultMode: 420
    optional: true
    secretName: istio-ingressgateway-certs
- name: ingressgateway-ca-certs
  secret:
    defaultMode: 420
    optional: true
    secretName: istio-ingressgateway-ca-certs

The Gateway

The Gateway resources are used to configure the ports for Envoy. Since we have exposed three ports with the service, we need these ports to be handled by Envoy. We can do this by declaring one or more Gateways. In my example, I'm going to use a single Gateway, but it may be split into two or three.

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: default-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:

  - hosts:
    - '*'
    port:
      name: http
      number: 80
      protocol: HTTP

  - hosts:
    - '*'
    port:
      name: https
      number: 443
      protocol: HTTPS
    tls:
      mode: SIMPLE
      privateKey: /etc/istio/ingressgateway-certs/tls.key
      serverCertificate: /etc/istio/ingressgateway-certs/tls.crt

  - hosts: # For TCP routing this fields seems to be ignored, but it is matched
    - '*'  # with the VirtualService, I use * since it will match anything.
    port:
      name: mysql
      number: 3306
      protocol: TCP

Valid ports are, HTTP|HTTPS|GRPC|HTTP2|MONGO|TCP|TLS. More info about Gateways can be found in the Istio Gateway docs

The VirtualService

Our final interesting resource is the VirtualService, it works in concert with the Gateway to configure Envoy. If you only add a Gateway nothing will show up in the Envoy configuration, and the same is true if you only add a VirtualService.

VirtualServices are really powerful and they enable the intelligent routing that is one of the very reasons we want to use Istio in the first place. However, I'm not going into it in this article since it is about the basic networking and not the fancy stuff.

Here's a basic configuration for an HTTP(s) service.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: counter
spec:
  gateways:
  - default-gateway.istio-system.svc.cluster.local
  hosts:
  - counter.lab.example.com
  http:
  - match:
    - uri:
      prefix: /
    route:
    - destination:
        host: counter
        port:
          number: 80

Now, when we have added both a Gateway and a VirtualService, the routes have been created in the Envoy configuration. To see this, you can kubectl port-forward istio-ingressgateway-xxxx-yyyy 15000 and check out the configuration by browsing to http://localhost:15000/config_dump.

Note that the gateway specified as well as the host must match the information in the Gateway. If it doesn't the entry will not show up in the configuration.

// Example of http route in Envoy config
{
  name: "counter:80",
  domains: [
    "counter.lab.example.com"
  ],
  routes: [
    {
      match: {
        prefix: "/"
      },
      route: {
        cluster: "outbound|80||counter.default.svc.cluster.local",
        timeout: "0s",
        max_grpc_timeout: "0s"
      },
      ...

Here's a basic configuration for a TCP service.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: mysql
spec:
  gateways:
  - default-gateway.istio-system.svc.cluster.local
  hosts: # The host fields seems to only be used to match the Gateway.
  - '*' # I'm using '*', the listener created is listing on 0.0.0.0
  tcp:
  - match:
      - port: 3306
    route:
    - destination:
        host: mysql.default.svc.cluster.local
        port:
          number: 3306

This will result in a completely different configuration in the Envoy config.

listener: {
  name: "0.0.0.0_3306",
  address: {
    socket_address: {
      address: "0.0.0.0",
      port_value: 3306
  }
},

Application Service and Deployment

Our request have now reached the application service and deployment. These are just normal Kubernetes resources and I will assume that if you have read this far, you already know all about it. :)

Debugging

Debugging networking issues can be difficult at times, so here are some aliases that I find useful.

Debugging networking issues can be difficult at times, so here are some aliases that I find useful.

# Port forward to the first istio-ingressgateway pod
alias igpf='kubectl -n istio-system port-forward $(kubectl -n istio-system
get pods -listio=ingressgateway -o=jsonpath="{.items[0].metadata.name}") 15000'

# Get the http routes from the port-forwarded ingressgateway pod (requires jq)
alias iroutes='curl --silent http://localhost:15000/config_dump |
jq '\''.configs.routes.dynamic_route_configs[].route_config.virtual_hosts[]|
{name: .name, domains: .domains, route: .routes[].match.prefix}'\'''

# Get the logs of the first istio-ingressgateway pod
# Shows what happens with incoming requests and possible errors
alias igl='kubectl -n istio-system logs $(kubectl -n istio-system get pods
-listio=ingressgateway -o=jsonpath="{.items[0].metadata.name}") --tail=300'

# Get the logs of the first istio-pilot pod
# Shows issues with configurations or connecting to the Envoy proxies
alias ipl='kubectl -n istio-system logs $(kubectl -n istio-system get pods
-listio=pilot -o=jsonpath="{.items[0].metadata.name}") discovery --tail=300'

When you have started the port-forwarding to the istio-ingressgateway, with igpf, here are some more things you can do.

Conclusion

Networking with Kubernetes and Istio is far from trivial, hopefully this article has shed some light on how it works. Here are some key takeaways.

To Add a New Port to the IngressGateway

  • Add the port to an existing Gateway or configure a new.
  • If it's a TCP service also add the port to the VirtualService, not needed for HTTP since it matches on layer 7 (domain name, etc.).
  • Add the port to the ingressgateway service. If you are using service type: LoadBalancer, you are done.
  • Otherwise, open the port in the load balancer and forward traffic to all worker nodes.

To Add Certificates to an SSL Service

  • Add the TLS secrets to the cluster.
  • Mount the secret volumes in the ingressgateway.
  • Configure the Gateway to use the newly created secrets.

Sunday, March 12, 2017

A Short Introduction to Makefiles

Makefiles are really good at one thing, managing dependencies between files. In other words, make makes sure all files that depend on another file are updated when that file changes.

We tell make how to do this by declaring rules. A typical rule looks like this:

A Simple Makefile

# Makefile
# Create bundle.js by concatenating jquery.js lib.js and main.js
bundle.js: jquery.js lib.js main.js
 cat $^ > $@

There are three parts to this rule:

  • The target, bundle.js, before the colon (:).
  • The prerequisites (what the target depends on), jquery.js lib.js main.js, after the colon (:).
  • The command, cat $^ > $@, on the next line after a leading tab, (\t).

There are two "automatic" variables in this command.

  • $@ - filename representing the target, in this case bundle.js.
  • $^ - filenames representing the list of the prerequisites (with duplicates removed).

"Automatic" means that the variables are automatically populated with relevant filenames. This will make more sense when we get into patterns later.

Here are some more variables that are useful.

  • $< - filename representing the first prerequisite.
  • $? - filenames representing the list of the prerequisites that are newer than the target.
  • $* - filename representing the stem of the target, in the above case bundle.

The Make Manual contains the full list of automatic variables

Execution

Running make with the above Makefile results in the following execution.

$ make
cat jquery.js lib.js main.js > bundle.js

$ make
make: 'bundle.js' is up to date.

make runs the first target it finds in the file if none is given on the command line. In this case it is the only target.

The second run didn't do anything since bundle.js is up to date. To be up to date means that the last-modified time of bundle.js is newer than any of its prerequisites' last-modified times. Simple but powerful! When we create new targets all we have to worry about is making sure that our targets know what files it depends on, and what files they depend on, and so on.

It is possible to enter many targets on the left of the colon (:). make will treat them as separate rules and the automatic variable will make sure that the correct files are built.

But, since make treats the rules as separate rules, it will only build the first of them, the default target.

# Makefile
bundle.js bundle2.js: jquery.js lib.js main.js
 cat $^ > $@
$ make
make: 'bundle.js' is up to date.

If we want to build bundle2.js, we can do it by explicitly telling make to do it by giving the target as command line parameter.

$ make bundle2.js
cat jquery.js lib.js main.js > bundle2.js

To get make to build both targets at once, we need to add a new, .PHONY:, target.

# Makefile
.PHONY: bundles
bundles: bundle.js bundle2.js

bundle.js bundle2.js: jquery.js lib.js main.js
 cat $^ > $@

Running make now results in (after removing bundle*)

$ make
cat jquery.js lib.js main.js > bundle.js
cat jquery.js lib.js main.js > bundle2.js

A .PHONY: target is a target without a corresponding file for make to check last modified time on. This means the target will always be run, forcing make to check if all the target's prerequisites needs to be built. The .PHONY label is not strictly necessary. If it is left out, make will check to see if there is a file called bundles and since there isn't one it will build it anyway.

Here's an illustration:

# Makefile
build:
 echo 'Running build'
# Makefile2
.PHONY
build:
 echo 'Running build'
$ touch build
$ make
make: 'build' is up to date.
$ make -f Makefile2
echo 'Running build'

Marking a target that doesn't represent a file as .PHONY: is easy to do and avoids annoying problems once your Makefile grows.

.PHONY: clean

Conventionally every Makefile contains a clean target to remove all the artifacts that are built. In the above case it would contain something like:

# Makefile
.PHONY: clean
clean:
 rm -f bundle*.js

make clean will now clean out all files created by the Makefile.

Directories

Directories in make usually needs a bit of special treatment. Let's say we want the bundles above to end up in a build directory. The following Makefile illustrates a problem.

# Makefile
bundles: build/bundle.js build/bundle2.js

build/bundle.js build/bundle2.js: jquery.js lib.js main.js
 cat $^ > $@

Running make illustrates the problem:

cat jquery.js lib.js main.js > build/bundle.js
/bin/sh: build/bundle.js: No such file or directory
make: *** [build/bundle.js] Error 1

The directory is not automatically created by cat. There are three ways to solve this and one is better than the others.

  1. Add mkdir -p to all rules creating files in the directory.
  2. Add a prerequisite to create the directory on the bundles target.
  3. Add an ordering prerequisite (|) to the rules creating the files in the directory.

1. is not good because the directory will be created more than once, one for each bundle (this is why the -p is needed). 2. is not good because the build directory is not a prerequisite target for bundles. 3. is good because the build directory is clearly a prerequisite of the rule that creates the bundles in this directory.

The reason we have to use an ordering prerequisite instead of a normal prerequisite is that cat would fail otherwise. Here's the resulting good Makefile.

# Makefile
bundles: build build/bundle.js build/bundle2.js

build:
 mkdir build

build/bundle.js build/bundle2.js: jquery.js lib.js main.js | build
 cat $^ > $@

clean:
 rm -rf build

Patterns

Now, we know the basics of Makefiles. We can create rules with targets, prerequisites and commands that are run when needed. But, we have been working with named files all this time. This works fine for small examples like above, but when we have hundreds of files this quickly gets out of hand. Patterns to the rescue.

Let's say that we have a bunch of images that we would like to optimize by running them through an optimizer. The images are in the images/ directory and the optimized images are built into build/images. The naive (and not working) way to do this is shown below. (I'm faking optimize with a simple copy, cp.) The % sign is glob matched with the part of the filename that is not literal.

# Makefile (NOT WORKING)
optimize: build/images/*

build/images/%: images/% | build/images
 cp $< $@

build/images:
 mkdir -p $@

To see why this is not a viable Makefile, we try to run it with make.

$ make
mkdir -p build/images
cp images/a.png build/images/*.

$ tree build
build/
└── images
    └── *

What is going on here? Why is only one image copied and why is it copied as name build/images/*? The problem is that the target files don't exist yet and the * is interpreted literally. If we copy the files into the build directory and touch the source files, it works the way we want.

# Copy the image directory into build
$ cp -r images build
# Touch the orignal images
$ touch images/*
# Build works since build/image/* evaluates to the list of images
$ make
cp images/a.png build/images/a.png
cp images/b.png build/images/b.png

Here is the main rule to know about patterns. The target file list has to be created from the available source files. To do this we have help of a number of functions, including wildcard, shell, etc. shell will allow us to call anything that we can call from the shell This is very powerful!

How do we solve the above problem? We can do this by getting a list of source images and transforming this list into a list of target images. This is easily done.

# Makefile

# 1. Get the souce list of images
images := $(wildcard images/*.png)

# 2. Tranform the source list into the target list
target_images := $(images:%=build/%)

# 3. Our default target, optimize, depends on all the target_images
optimize: $(target_images)

# 4. Build the targets from the sources, make sure build/images exist
$(target_images): build/% : % | build/images
 cp $< $@

build/images:
 mkdir -p $@

The first line introduces both variables and functions.

Variables can be declared in a number of ways, but the :=-declaration is the simplest. It evaluates the value on the right and sets the value on the left to the result, like variables in most programming languages.

Functions are called with the $() construct, and wildcard is a function that evaluates a shell filename pattern and returns a list of filenames.

The full line above populates images with the .png files from the images directory.

The second line converts the source images into the target images. Variables are evaluated the same way as function calls, with the $() construct. By adding a colon-equals expression, a variable substitution reference, after the variable name we can substitute a pattern for another. Example

files := "src/a.java src/b.java"
# Pattern replaces the files into "target/a.class target/b.class"
class_files := $(files:src/%.java:target/%.class)

The third line tells make that our optimize target depends on all the targets existing. This makes sure that all the targets are built.

The fourth line sets up the targets `$(target_images) and its prerequisites with a static pattern rule. The pattern does the opposite of the variable substitution reference above, it deconstructs a single target into the source it depends on. The final part of the of this line is an order prerequisite on the rule to create the directory.

A Recipe for Creating Makefiles

  • Create a list of targets that you want to create from the sources. You have the full power of bash, python, awk, etc. at your disposal.
  • Create a static pattern rule to convert a single target into the source it can be created from.
  • Add order prerequisites to make sure directories are automatically created.
  • Add a callable target that depends on all the target files that you want to create.

Commented Example

Here's a more exotic example of what you can use make for. We have a directory of Javascript source files in lib. The corresponding test files are in test. There may be multiple directories below both lib and test. The testfiles are named like the source files with an added .spec after the stem of the filename.

We want to use a makefile to help us run only the tests that are relevant based on the files that are changed. To keep track of what tests have been run we're going to use marker files and touch them every time a test is run.

# Create the list of test files by using the shell function and find
test_files := $(shell find test -name '*.spec.js' -print)

# Convert the test files into marker files with variable substitution
# A marker files looks like this tmp/model/person_test.marker
marker_files := $(test_files:%.js=tmp/%.marker)

# Do the same thing for the test directories
test_dirs := $(shell find test -type d -print)

# The marker directories have their normal names, no special ending
marker_dirs := $(test_dirs:%=tmp/%)

# test is the default target
.PHONY: test
test: $(marker_files)

# The marker files order depend on the marker directories
$(marker_files): | $(marker_dirs)

# Marker files depend on the source files
# Deconstruct a marker file into a source file
$(marker_files): tmp/test/%.spec.marker : lib/%.js

# Marker files depend on the test files
# Deconstruct the marker file into a test file
# When any prerequisite changes, run the tests and then touch the marker file
$(marker_files): tmp/%.marker : %.js
 mocha $<
 @touch $@

# Create the marker dirs
$(marker_dirs):
 @mkdir -p $@

# Clean the project by removing the entire tmp directory
.PHONY: clean
clean:
 rm -rf tmp

Now whenever you run make, it will run only the relevant tests.

# Modify a test file
$ touch test/models/passbook.spec.js
$ make
mocha test/models/passbook.spec.js
  passbook
    ✓ generate pass strips image names
    ✓ doesnt crash with no store number
  2 passing (21ms)

# Modify a source file
$ touch lib/models/passbook.js
$ make
mocha test/models/passbook.spec.js
  passbook
    ✓ generates pass strip image names
    ✓ doesnt crash with no store number
  2 passing (22ms)

Makefiles are really good at one thing: building only stale files. If that is our problem, we should give make a try.

Wednesday, March 09, 2016

Programming an HS-1969

We are all programmers even you who don't consider yourselves programmers. We are programmers of the hardest computer of all, the Homo Sapiens, ourselves!

Aha

14 years ago my son Rasmus was born. It was a difficult time, partly because he had colic, but mostly because I couldn't understand what this little critter wanted. But, one day while I was changing his diaper he said something that sounded like Aha! Aha, I thought out loud and he reacted with a smile and said Aha, again. This little kid had picked up that every once in a while I actually understood what he was trying to communicate and when I did I said Aha! He liked it so much that he learned to say Aha himself almost before he could say anything else.

Insight

We all know what an insight is and that it is a great feeling. Having an Aha-moment feels great! A lot of human progress (all of it?) has its roots in Aha moments.

Einstein

Einstein is known as a very intuitive scientist. He had a lot of insights and it was not only the special theory of relativity. He proposed the quantum theory of light and the link between mass and energy and got the nobel prize for the photoelectric effect.

When he was shaving in the mornings he always shaved very slowly, because he often had Aha-moments while shaving and was afraid to cut himself with the razor.

Examples of Insights

An insight can help with a lot of things:

  • The punchline of a joke
  • The solution to a crossword puzzle, riddle or rebus.
  • Understanding why people behave the way they do.
  • Resolving inconsistencies in our thinking.
  • Realizing that one problem is similar to another.

Understanding

More than anything an insight is understanding. It is when the pieces fall together and we finally get how something or someone works.

A Doh moment is also an Aha-moment, but it is when you are realizing something trivial that you believe you should have known all along. An example can be a song you have been singing your whole life and suddenly you realize that you have misunderstood the song all along. Here is a personal example:

Jimmy Hendrix sings: Excuse me while I kiss the sky,
not: Excuse me while I kiss this guy.

Even though the latter would have been more progressive :)

Some Insight Problems

What historical person does "Horobod" symbolize?

A window washer fell from a 40-foot ladder without hurting himself. How is this possible?

Thiss sentence has thre errors? What are they?

Fermat's Conjecture

Fermat conjectured his theorem in 1637 in the margin of Arithmetica. He wrote,


It is impossible to separate a cube into two cubes
or a fourth power into two fourth powers, or in general 
any power higher than the second into two like powers. 
I have discovered a truly marvelous proof of this, 
which this margin is too narrow to contain.

It took 358 years for mankind to come up with this proof and Andrew Wiles spent 7 years of his research time to prove it. Here is how he describes it:

Two More Insight Problems

Turn the pyramid upside down by moving three coins.

Draw four straight lines through all the dots without lifting the pen.

When Do We Get Insights? Bath, Bed, and Bus

Insights can come at any time, but most often when we are not actively focusing on the problem. Wittgenstein famously said that "the key to creative thinking is the three B:s, Bath, Bed, and Bus.

Archimedes got his Eureka-moment when he was lowering himself into a bath and realized that he could measure the volume of an irregular body by lowering it into water and measuring the amount of water that flowed out.

Edison used to take working naps. He would sit down on a chair with two metal balls in his hands and rest. When he fell asleep the balls would fall out of his hands and onto the steel pan he had placed below himself. He was trying to trigger insights which he often got when falling asleep.

Poincaré got one of his greatest ideas when he was about to get on a bus.


At the moment when I put my foot on the 
step the idea came to me, without
anything in my former thoughts seeming to 
have paved the way for it, that the
transformation that I had used to define 
the Fuchsian functions were identical 
with those of non-euclidean geometry.

How do we get insights?

In order to get insight about something you have to prepare for it. It is impossible to get insights into something you know nothing about.

The Four Stages of Creativity

Preparation means to learn about something, to prime your brain. Then do something else. Here are two ways to do this. 1) Study until you get stuck, until you reach an impasse. 2) Study a little every day to keep the subject percolating in your mind.

Incubation starts when we walk away from the problem. It is when our unconscious mind takes over and keeps on working. Some good ways to let go of a problem is to exercise, walk the dog, sleep, relax!

Illumination is the moment of insight.

Verification is optional for many types of insight, but when doing science it is essential. It is when you prove that the insight is consistent with reality.

Your Memory Bank

It is what you have in your memory bank, what you can 
recall instantly that is important. If you have to look it up,
it is useless for creative thinking!
-- Linus Pauling

You cannot have an insight about things that you don't know anything about. If you don't have the raw-material, you have nothing to work with. You cannot Google for insights. Learn everything you can about a subject and then let go, relax!

Deep Learning

Deep learning is more than just knowing things. It is about understanding things. Insights are crucial for this but so is knowing a lot of things.

Working Memory and Long-Term Memory

Working memory, also known as short-term memory is like a blackboard. It is very limited and things have to be erased before we can put something else on it.

Long-term memory, on the other hand, is surprisingly large. We don't know if it has any limit at all.

Learning is essentially to move stuff from working memory into long-term memory.

  • Encoding change the input from our senses into a format the brain can store.
  • Consolidation recoding the memory to fit with other things that we know. This is done largely unconsciously.
  • Re-consolidation is when we recall a memory by reflecting or actually retrieving it. This re-creates the memory and also changes it.

Focused Mode and Diffused Mode

Our brain works in two modes, focused mode and diffused mode. Focused mode allows us to learn in a sequential step-by-step way. This is the mode we need when we are encoding and learning new things. Diffused mode is when we are not focusing on the problem, when our brain is consolidating. This is when we have insights.

Chunk

A chunk is a piece of information related to prior knowledge. To learn something is to create a new chunk. The more we learn about something the bigger the chunk gets and the easier it is for us to relate it to other chunks. A chunk can be kept in working memory as a single piece, without us having to bring in all the details about it.

Steps to Form a Chunk

  1. Focus your undivided attention on the task.
  2. Understand, figure out what is the main idea and relate it to what you know.
  3. Test yourself to verify that you understand.
  4. Gain context by figuring out not only how but when to use it.

Relate Chunks

To gain better understanding try to relate similar chunks together. A good way to do this is by using metaphors and similes. A simile is when we say something is like something else: Strong as an ox. A metaphor is when we say something is something else: You are my sunshine.

Metaphors transforms the strange into the familiar.
--Twyla Tharp, The Creative Habit

The Theory of Disuse (Forget-to-Learn)

Any chunk of memory has two characteristics, storage strength and retrieval strength. Storage strength is how well the item is learned. Retrieval strength is how accessible the memory is at the time.

Storage strength increases monotonically, while retrieval strength varies with context.

Both retrieval strength and storage strength increases with use.

Retrieval is key. Retrieving causes us to re-learn the memory. And, the harder it is to retrieve the deeper it gets stored in memory. More pain, more gain!

If we see the mind as a forest, a newly learned memory is somewhere in the forest. If we go to retrieve the memory, a path will be created to it. The more you retrieve it the wider the path gets. If we instead look it up, the path doesn't get created.

Memory Strength Matrix

In this 8 minutes long video, Destin from http://smartereveryday.com goes through all the variations of a memory can have.

  • When he starts, he knows how to ride a normal bike well. Strong storage and retrieval strength.
  • After a bit of trying, he learns how it works to ride a reversed bike. Weak storage and retrieval strength.
  • He manages to ride the reversed bike but falls as soon as he is distracted. Strong retrieval and weak storage.
  • When he then tries to ride a normal bike again, ha cannot do it. Even though riding a normal bike is strongly stored, it has weak retrieval strength.

Spaced Repetition

Repeated retrieval is the mother of all learning, but it not good to just repeat over and over without a pause. Because then we are leaning too much on working memory. Pausing between repetitions is good for two reasons:

  1. It gives the diffuse mode time to work.
  2. It lets us forget. Remember, we have to forget to learn.

The most efficient way to learn something is to space it out into intervals.

How long should we wait before we try to retrieve a memory? The longer we wait, while not totally forgetting it, the better. The harder the memory is to retrieve the deeper it gets stored.

So what is the optimal time span? Of course scientists have a solution for this. :)

If you have a test and you have decided to study for it three times. The optimal time for the second study session is as described in the table. The last study session should always be the day before the test.

But, spacing out is not the only important thing, context is also important.

Context

Context, under what circumstances are you studying.

Location matters when you are studying. In one experiment, researchers let two groups of people study vocabulary while under water. One group had to take the test under water and the other group had to take the test on land. The group who was submerged got better results than the group who was on land.

In another study, the subjects were studying after smoking marijuana. Same result there, the group who were stoned got better results than the unstoned group.

And the same thing again with music. If students were allowed to listen to the same music that they had used while studying they got better results. It does not matter what kind of music it is, Mozart is not better than AC/DC. Interestingly not listening to music didn't have this effect. Not listening to music does not seem to provide a context.

Variation

You may be thinking, "How in the world am I going to get my teacher to allow me to take my tests stoned, under water, while listening to AC/DC?". This is not the point, the point is variation. If we learn under varying conditions our brain gets better at retrieving the information under conditions where we haven't practiced. And you can vary anything:

  • The color of the paper you are writing on.
  • Your mood, are you happy or sad?
  • Inside or outside
  • Morning or evening
  • Sitting still or exercising
  • etc.

Interleaving

Studying the same thing over and over again is called massed practice. It is commonly believed to be good, because it feels like you are learning fast, but it it's an illusion. It is much more effective to interleave your practice even though it doesn't feel that way.

Don't study the same thing over and over, vary the tasks. Don't just learn to calculate the area of a circle over and over, vary the practice with different figures.

In one study children where told to practice throwing bean bags. One group practiced with a distance of three feet. Another group practiced from two and four feet. The day after they had a competition, from three feet. The group which had used interleaved practice won easily even though they had never practiced on this distance.

Desirable Difficulties

The harder it is for us to retrieve the information the better we learn. It is better to attempt to retrieve and be wrong than to not attempt at all as long as we get feedback about what the correct answer is.

Think of learning as exercising. The harder it is, the stronger we get. No pain, no gain!

Testing

Testing is a great way to learn mainly for two reasons.

  1. It tells you what you know and don't know.
  2. It is an extreme form of retrieval. It focuses our mind, because when we really want to retrieve the information we try extra hard.

It is even helpful to pre-test what you haven't learned yet. This tells your brain that you are interested in this information and makes it more receptive and focused when you study.

Test come in varying degrees of difficulty.

From easy to hard:

  • Multiple choice
  • Fill in the blank
  • Reply with a sentence
  • Write an essay

Classroom Testing

Multiple studies have shown that students in classes that have many tests, one per week gets better grades than students who only get two tests per semester. A full grade better on average!

Self-Testing

It is not necessary to have formal tests. We can verify that we know by asking ourselves and answering questions.

  • Do I understand what this means?
  • What are the basic ideas in this text?
  • Can I explain this to someone else?
  • How does it relate to what I already know?

Sleep

Sleep is very good for our brain primarily for two reasons. First, being awake creates toxic products in our brain and sleeping cleans them out. Second, while we sleep our brain tidies up and removes unimportant ideas while simultaneously strengthening ideas that it believes are important to you.

Shallow Learning or Perceptual Learning

Shallow learning or perceptual learning is learning without understanding. It is pattern recognition and it is great for categorization. Perceptual learning is active, our eyes or other senses are searching for the right clues automatically and tunes itself. Here are some examples.

Chicken Sexing

To find out what sex a chicken has is very difficult and the experts who can do it cannot describe how they do it in a way that is understandable to an outsider. So how do you learn? By example. Make a wild guess. After each guess, the master chick-sexer gives you feedback.Yes, no, no, yes. And, eventually, you just start to make correct guesses without knowing how.

Flight Training

When a pilot learns to fly by instruments he has six instruments he needs to know. Airspeed Indicator, Attitude Indicator, Altimeter, Vertical Speed Indicator, Heading Indicator, Turn Coordinator.

When novices try to fly by instruments, they have a hard time understanding what is going on. While they focus on one instrument the other instruments change and it is difficult to get an understanding of what it means. Experts pilots, on the other hand, quickly glance at the instruments and instantly know what is going on.

By simple flash-card training a novice can get as good at instrument reading in one hour that normally takes a thousand hours. This training takes advantage of the perceptual learning ability our brain has for quickly categorizing the information.

Art Recognition

Flash-cards can also be used to learn to categorize painting styles. Without having any other knowledge of the paintings or the painters we can learn to categorize a painting as surrealism, minimalism, or any other style by just practicing over and over.

Obstacles

The biggest obstacle to our learning is ourselves. We have a tendency to do what feels good and not what is good and it is easy to fool ourselves to think that we know something when we actually don't.

Rickard Feynman once said:

The first principle is that you must not fool yourself - and you are the
easiest person to fool!

So how do we fool ourselves?

Illusions of Understanding

Re-reading

Reading a text over and over gives us the illusion that we know the text since we become familiar with it. We recognize the text as we read it. This is not the same as knowing it. The solution is retrieval practice!

Highlighting

Highlighting text is another way we fool ourselves. If we look at a page which is highlighted all over, it is easy to think that we already know it. The solution is retrieval practice!

Looking at the Answer

If we are solving problems, it is very easy to look at the answer before we have actually tried to solve the problem. We look at the answer and we tell ourselves that we could have solved this without looking. The solution in this case is to actually try to solve the problem. Even if we fail to solve it, we learn better than by just looking at the answer.

Googling

By Googling instead of trying to retrieve a memory we rob our brain of the extra storage and retrieval strength that come with retrieval. It is better not to remember and let our diffuse mode go to work and let it pop into our head while we least expect it.

Echoing Other People's Words

If you know a text verbatim, in exactly the same words used by someone else, odds are that you don't really know what you are saying. If you cannot explain it in your own words, you are probably fooling yourself.

Distractions

Another big obstacle to learning is distractions. The brain needs focus time in order to store memories deep enough for later retrieval.

Studies have shown that students who learn while watching TV at the same time may actually know it quite well immediately after studying. But, when tested later their retrieval ability was abysmal.

If you have problems focusing while studying, the pomodoro technique may help you. Set a timer for an amount of time, 20 minutes or half an hour, and vow not to be extracted until the timer rings.

Procrastination

Procrastination is the avoidance of doing a task which needs to be accomplished. It is the practice of doing more pleasurable things in place of less pleasurable ones, or carrying out less important tasks instead of more important ones, thus putting off impending tasks to a later time.

Start

The key to overcome procrastination is to Just Start. Very often the anticipation of doing something is worse than actually doing it.

The writer Dorothy Parker once said:

Writing is the art of applying the ass to the seat.

Solutions to Insight Problems

What historical person does "Horobod" symbolize? Robin Hood

A window washer fell from a 40-foot ladder without hurting himself. How is this possible? He fell from the bottom step.

Thiss sentence has thre errors? What are they? Two spelling errors, and the semantic error that the sentence only has two errors while claiming to have three.

Pyramid solution

9-dots solution

Happiness

Martin Seligman is the author of the book, Authentic Happiness. He describes how, in a study on creative problem solving, subjects who were put in a good mood produced much better results. Happy people perform better! Increasing happiness, increases the likelihood of insight.

So here is story to put a smile on your face and to increase your creativity.


A priest fell into a well, but managed to get a hold of a
small vine before falling into the abyss. When he had been 
hanging there for almost an hour, he heard a loud thunder and a voice from above.

"This is your God speaking, if you let go of the vine I will save you!"

The priest contemplated this for a while and then he yelled, "Is there anybody
else up there?"

Summary

Insights help us to understand and relate things. In order to have more insights we have to learn things deeply and then let go. The best way to learn new things is by retrieval practice. Our biggest obstacle to learning is that we fool ourselves that we know things we don't know. Happy people have more insights, so by seeing the light side of things we not only feel better, we perform better.

References

Monday, November 23, 2015

Simple Clustering with Docker Swarm and Nginx

Bringing up your own cluster has never been easier. The recent 1.0 release of Docker Swarm signals that the Docker team feel that Swarm is ready for production.

I've been running a bunch of applications on Docker for a while now, but I have managed the containers on the single machine level instead of as a cluster. With the release of Swarm 1.0, I believe it is time to start clustering my machines.

Spinning Up the Swarm

How to spin up a Swarm for development is described well in the Docker documentation and I'm not going to describe it in depth here. I'll settle for the commands and extra documentation when I feel that it may be called for.

I'm using the Swarm for development with VirtualBox here, but it is simple to substitute any of the supported docker-machine providers.

Create a Token

Create a token with the Docker Hub discovery service. When running this in production you should probably setup an alternate discovery backend to avoid the external dependency.

# Create and save a token, using the Docker-Hub discovery service, default
$ token=$(docker run swarm create)

Create a Swarm Manager

The swarm manager will be used to control the swarm. It should be protected from access from anyone but you. I'll simulate this here by setting --engine-label public=no. This is just a tag and you would have to make sure that you setup the manager protected from public access. It is possible to use multiple labels to tag the engine with all the qualities of this machine.

# Create a swarm manager using the token
$ docker-machine create \
  -d virtualbox \
  --swarm \
  --swarm-master \
  --swarm-discovery token://$token \
  --engine-label public=no \
  swarm-master

Create a Publicly Accessible Machine

In this demo I'm only spinning up another VirtualBox machine and I'm giving it the --engine-label public=yes to allow me to discover this box in the swarm.

# Create a new node named frontend and label it public
$ docker-machine create \
  -d virtualbox \
  --swarm \
  --swarm-discovery token://$token \
  --engine-label public=yes \
  frontend

Create a Couple of Additional Non-Public Machines

Here I start a couple of machines with an additional --engine-label. One with model=high-memory and one with model=large-disk

# Create two more nodes named backend1 and backend2, with label public=no
$ docker-machine create \
  -d virtualbox \
  --swarm \
  --swarm-discovery token://$token \
  --engine-label public=no \
  --engine-label model=high-memory \
  backend1

$ docker-machine create \
  -d virtualbox \
  --swarm \
  --swarm-discovery token://$token \
  --engine-label public=no \
  --engine-label model=large-disk \
  backend2

List the Swarm

# List your machines
$ docker-machine ls
NAME           ACTIVE   DRIVER       STATE     URL                         SWARM
backend1       -        virtualbox   Running   tcp://192.168.99.103:2376   swarm-master
backend2       -        virtualbox   Running   tcp://192.168.99.104:2376   swarm-master
frontend       -        virtualbox   Running   tcp://192.168.99.102:2376   swarm-master
swarm-master   -        virtualbox   Running   tcp://192.168.99.101:2376   swarm-master (master)

Connect to the Swarm

Configure the docker client to connect to it.

# List the environment needed to connect to the swarm
$ docker-machine env --swarm swarm-master
export DOCKER_TLS_VERIFY="1"
export DOCKER_HOST="tcp://192.168.99.101:3376"
export DOCKER_CERT_PATH="/Users/andersjanmyr/.docker/machine/machines/swarm-master"
export DOCKER_MACHINE_NAME="swarm-master"
# Run this command to configure your shell:
# eval "$(docker-machine env --swarm swarm-master)"

# Configure docker to use the swarm-master
$ eval $(docker-machine env --swarm swarm-master)

# List information about the cluster, output is trimmed
$ docker info
Containers: 4
Images: 4
Role: primary
Strategy: spread
Filters: health, port, dependency, affinity, constraint
Nodes: 4
 backend1: 192.168.99.103:2376
   Containers: 1
   Reserved CPUs: 0 / 1
   Reserved Memory: 0 B / 1.021 GiB
   Labels: model=high-memory, provider=virtualbox, public=no, storagedriver=aufs
 backend2: 192.168.99.104:2376
  Containers: 5
  Reserved CPUs: 0 / 1
  Reserved Memory: 0 B / 1.021 GiB
  Labels: model=large-disk, provider=virtualbox, public=no, storagedriver=aufs
 frontend: 192.168.99.102:2376
   Containers: 1
   Reserved CPUs: 0 / 1
   Reserved Memory: 0 B / 1.021 GiB
   Labels: provider=virtualbox, public=yes, storagedriver=aufs
 swarm-master: 192.168.99.101:2376
   Containers: 2
   Reserved CPUs: 0 / 1
   Reserved Memory: 0 B / 1.021 GiB
   Labels: provider=virtualbox, public=no, storagedriver=aufs
CPUs: 4
Total Memory: 4.086 GiB
Name: fa2d554280ff

Starting the Containers

Now it is time to start the containers. The plan is to bring up two database containers, Postgres and Redis, two counter web-services, and one proxy to front the whole cluster, like this.

Alright, let's start some containers!

Databases

According to the picture above I want to put the Redis container on the machine named backend1, but I don't want to address it by name, instead I'm going to target it by its labels.

I also want to start a Postgres container on a machine with a constraint:model==large-disk.

Starting Redis

# Start Redis on a non-public machine with high-memory.
$ docker run -d --name redis \
  --env constraint:public!=yes \
  --env constraint:model==high-memory \
  redis

In this case, constraint:public!=yes is not needed but I like to add it to avoid mistakes.

Starting Postgres

# Start Postgres on a non-public machine with large-disk
$ docker run -d --name postgres \
  --env constraint:public!=yes \
  --env constraint:model==large-disk \
  postgres

If this was not a VirtualBox machine I would also mount a volume, -v /var/pgdata:/var/lib/postgresql/data, for the database, but this does not work with VirtualBox.

OK, let's see what we have.

# List running containers, output slightly trimmed
$ docker ps
CONTAINER ID     IMAGE       COMMAND                  PORTS            NAMES
aa1679b3da5c     postgres    "/docker-entrypoint.s"   5432/tcp         backend2/postgres
ffa41d90f414     redis       "/entrypoint.sh redis"   6379/tcp         backend1/redis

Nice, two running databases on the designated machines.

Starting the Reverse Proxy

Nginx is one of my favorite building blocks when it comes to building reliable web services. Nginx provides an official Docker image, but in this case, when I want to automatically configure Nginx when new containers are started, I prefer to use an alternative image called nginx-proxy.

A container started from the nginx-proxy image, listens to events generated by the docker engine. The engine generates events for all kinds of events but all we care about here is when a container is started and stopped. If you want to see what events are triggered from the CLI, run docker events in one terminal and start and stop a few containers in another.

When nginx-proxy receives an event that a container has been started it checks if the container has any ports EXPOSEd, if it does it also checks for a VIRTUAL_HOST environment variable. If both these conditions are fulfilled nginx-proxy re-configures its Nginx server and reloads the configuration.

When you now access the VIRTUAL_HOST, Nginx proxies the connection to your web service. Cool!

Naturally, you will have to configure your DNS to point to your Nginx server. The easiest way to do this is to configure all your services to point to it with a wildcard record. Something like this:

*.mysite.com     Host (A)    Default     xxx.xxx.xxx.xxx

In this case, we are using VirtualBox and we can settle for changing the /etc/hosts file with the IP-number of our frontend.

# /etc/hosts
redis-counter.docker    192.168.99.102
postgres-counter.docker 192.168.99.102

What is even more cool is that events works with Swarm and it is possible to use the nginx-proxy to listen to services that are started on different machines. All we have to do is configure it correctly.

Starting Nginx-Proxy

nginx-proxy is started with configuration read from the docker client environment variables. All the environments variables were automatically configured when you configured the docker client to access the Swarm, above.

# Start nginx-proxy configured to listen to swarm events, published on port 80.
$ docker run -d --name nginx \
  -v $DOCKER_CERT_PATH:$DOCKER_CERT_PATH \
  -p "80:80" \
  --env constraint:public==yes \
  --env DOCKER_HOST \
  --env DOCKER_CERT_PATH \
  --env DOCKER_TLS_VERIFY \
   jwilder/nginx-proxy

OK, we are almost done. Now it is time to start the web services.

Starting Web Services

As a web service I'm going to use a simple counter image since it can use both Postgres and Redis as backend. I want to start the web services on the same server as the databases since this allows me to use --link to connect to the container and it will speed up the data access. To do this I can use an affinity constraint: --env affinity:container==*redis*.

# Start a counter close to the container named redis and link to it.
$ docker run -d --name redis-counter \
  -p 80 \
  --link redis \
  --env affinity:container==*redis* \
  --env REDIS_URL=redis:6379 \
  --env VIRTUAL_HOST=redis-counter.docker \
  andersjanmyr/counter

The affinity constraint is not really necessary since affinity constraints are automatically generated by Swarm when --link is present as you can see when we start the postgres-counter.

# Start a counter close to the container named postgres and link to it.
$ docker run -d --name postgres-counter \
  -p 80 \
  --link postgres \
  --env POSTGRES_URL=postgres://postgres@postgres \
  --env VIRTUAL_HOST=postgres-counter.docker \
  andersjanmyr/counter

Browse to http://redis-counter.docker or http:/postgres-counter.docker and you should see your services up and running.

Summary

Here's an illustration of our current setup:

And here is a listing of all the containers on their respective machines.

$ docker ps
CONTAINER ID        IMAGE                  COMMAND                  PORTS                                NAMES
b3869a89e76c        andersjanmyr/counter   "/counter-linux"         192.168.99.104:32768->80/tcp         backend2/postgres-counter
cff69b6f970d        andersjanmyr/counter   "/counter-linux"         192.168.99.103:32768->80/tcp         backend1/redis-counter
64af31135c26        jwilder/nginx-proxy    "/app/docker-entrypoi"   443/tcp, 192.168.99.102:80->80/tcp   frontend/nginx
aa1679b3da5c        postgres               "/docker-entrypoint.s"   5432/tcp                             backend2/postgres,backend2/postgres-counter/postgres
ffa41d90f414        redis                  "/entrypoint.sh redis"   6379/tcp                             backend1/redis,backend1/redis-counter/redis

May the Swarm be with you! :D