Math, Machine Learning and Programming: November 2012

Wednesday, November 28, 2012

Lecture on Pattern Recognition/Machine Learning

Last semester I attended a lecture on Pattern Recognition/Machine Learning. The lecture was captured on video and can now be watched on youtube:

http://www.youtube.com/playlist?list=PLuRaSnb3n4kRDZVU6wxPzGdx1CN12fn0w

Sunday, November 25, 2012

Project Euler Problem 40

Euler problem 40 can be solved quite easily using an intuitive brute force approach.
First we generate a string of the decimal fraction that is long enough to hold each of the requested digits. Then we read each digit and calculate the product. With getDecimalFraction and getDigit defined seperately this reduces to just a few lines of code:

int main() {
  int max = 6;
  string num = getDecimalFraction(pow(10,max));
  int product = 1;
  for (int i = 0; i <= max; i++) {
    int digit = getDigit(num, pow(10, i));
    cout << digit  << "  ";
    product *= digit;
  }
  cout << "\n" << product << endl;
  return 0;
}

The functions used in this snippet are defined like this:

// calculate decimal fraction by concatenating positive integers
// maxDigit is the maximum digit that needs to be contained;
// as soon as the length of the fraction is greater or equal
// to maxDigit the fraction is returned
std::string getDecimalFraction(int maxDigit) {
  std::string fraction = "";
  int i = 1;
  while(fraction.size() < maxDigit) {
    std::stringstream number;
    number << i;
    fraction += number.str();
    i++;
  }
  return fraction;
}

int getDigit(std::string fraction, int n) {
  if (n > fraction.size())
    throw std::exception();
  if (n < 1)
    throw std::exception();
  n--;
  int digit = (int) fraction[n] - '0';
  return digit;
}

You could also think of a solution by hand as there is a rule of how many digits you add with each integer (9 x 1 digit for 1...9, 990 x 2 digits for 10...999, etc.). However, the C++ solution is easy to implement and gives a fast answer to the question. As usual, the code can be found on github.

Thursday, November 15, 2012

interesting youtube channel on ML

The youtube user mathematicalmonk has uploaded videos on various topics, most of which concerning machine learning, all of which are combined in this playlist. In his videos mathematicalmonk starts from the very basics of machine learning and advances to more sophisticated methods later on. He takes his time explaining the content in a very understandable manner and does not go too much into detail when it comes to math. Altogether his videos I recommend his videos for anyone who is interested in machine learning, starting to study machine learning, needs to refresh stuff learned a while ago or wants to get a general overview on this wide topic. Based on the videos you can always go more into detail on whatever machine learning topic you like.

Once again, the link to his channel.

Tuesday, November 13, 2012

Project Euler Problem 29

In Project Euler problem 29 you are to determine the number of unique terms in

$a^b \forall \; 2 \leq a,b \leq 100$

Unique means that any term is counted only once, e.g. $2^32 = 16^8 = 4^16$ is counted only once. You could - of couse - use a brute force method using large Integeres and the c++ container set. However, there is a more elegant approach that can be used to avoid explicit calculation of $a^b$.

If you - for once - ignore multiple occurences, it is quite straightforward to get the number of terms.

Each $a$ produces $100-2+1=99$ terms. As there are $100-2+1=99$ possible values for a, that yields $9801$ terms. From this number the number of duplicates has to be subtracted.

I tried to write an implementation in C++, but it did not come to the correct result. In the end it was much faster and easier to do the calculations and logic by hand:

First consider numbers $k$ that can be written as \k=(a^2\) with

$\forall \, i \in \mathbb{N}: \sqrt[i]{a} \not\in \mathbb{Q}$, i.e. $a \in \{2,3,5,6,7,10\}$

For $4^n = 2^{2n}$ it is easy to be seen that every term is a duplicate of 2$2^{\frac{2n}{2}}$ up to $2n = 100$, resulting in 49 duplicates. This is true for any $a$ given above. Therefore we have $6*49 = 294$ duplicates so far.

Secondly take into account numbers $k$ that can be written as a third power of another number, the latter of which cannot be written as a power of a smaller number:

$k=a^3, \forall \, i \in \mathbb{N}: \sqrt[i]{a} \not\in \mathbb{Q}$, i.e. $a \in \{2,3\}$

For $k^n = a^{3n}$ every term is a duplicate $\forall n \in [2, 33]$, resulting in 32 duplicates for each $a$. Furthermore, there are duplicates for which $a^{3n} = a^{2\tilde n}$. Obviously, all values $\in \leq 50$ have already been accounted for. The upper bound of the interval is given by $n_{max} = \frac{2}{3}\tilde n_{max}$ with $\tilde n_{max} = 100$. The lower bound is $34$. Only even numbers are to be taken into account. This results in another $17$ duplicates for each $a$. Altogether that is $98$ duplicates.

In the next step take into account the numbers that can be written as fourth powers of other numbers. In this case these are $k = 16 = 2^4$ and $k = 81 = 3^4$. In the same fashion as above we get 49 duplicates for the cases where $a^{4n} = a^{2\tilde n}$. In addition we have to consider the cases for which $a^{4n} = a^{3\tilde n}$. $n$ has to be a multiple of $3$ and also $51 \leq n \leq 75$, thus adding $9$ more duplicates for each $k$.

From now on only powers of $2$ are to be considered, as $3^5 = 243 > 100$. We now need to find $n$, s.t.

$2^{5n} = 2^{i\tilde n} \forall i \in [1,4]$

For $i = 1$ n goes from $2$ to $20$, giving $19$ duplicates. If $i = 2$ there are duplicates for all even $n \in [22, 40]$, that is $10$ duplicates. For $i = 3$ all multiples of 3, $n = 3k \in [21, 60]$ are duplicate. The duplicates already accounted for for $i = 2$, i.e. $\{24, 30, 36\}$ have to be ignored, i.e. $11$ duplicates. Finally, for $i = 4$ all multiples of 4 in the range $[44, 80]$ have to be counted, except for $60$, which gives $8$ duplicates. That makes for 48 duplicates.

The last case is $64 = 2^6 = 8^2$. Using the same considerations as before, the number of duplicates can easily be determined to be 62:

$64^n = 8^{\tilde n} \Rightarrow 49 $

$2^{6n} = 2^{4\tilde n} \Rightarrow 8 $

$2^{6n} = 2^{5\tilde n} \Rightarrow 5 $

Adding up all duplicates we get:

$294+98+116+48+64 = 618$

So in the end there are $9801 - 618 = 9183$ unique terms.

Thursday, November 8, 2012

Project Euler Problem 18

Many of you may know project Euler. It is a website posing interesting programming tasks, varying in difficulty. About a year ago I had a first look at it, but for lack of time I did not solve too many of the problems. Today I felt in the mood to get hands on that again and I had not solved problem 18 up till now:

http://projecteuler.net/problem=18

You are given a triangle of numbers and your task is to calculate the maximum sum going from top to bottom, while you can go either on step to the left or one step to the right on your way down.
The first, naive approach is to use brute force and calculate every possible sum. However, as stated at project Euler, for larger triangles computation time will soon be too long.

So here is the approach I was following:

At each point in the triangle (except for the top and edges) the maximum possible sum is the number at that point plus the maximum of the sum at the two predecssing points, i.e. left and right predecessors.

Using this you avoid brute force and can solve larger the same problem for larger triangles, such as given in problem 67, in a reasonable amount of time.

http://projecteuler.net/problem=67

Based on what I wrote you can use any language you like to implement your solution.
If you feel the need to look at my implementation, go to my git my git repository.
It gave me the correct answers for both problem 18 and problem 67.

Remark: This algorithm might remind you of message passing algorithms such as max-product that are used for inference in graphical models.