Tuesday 1 April 2014

Sorting and Efficiency

I really hope I haven't missed a SLOG.

Sorting is an interesting topic in that there are so many ways to do it, but it's incredibly important to use the one most optimal for a given situation.  In high school I learned a massive amount of sorts (I can barely remember them all) but the one I remember using the most is the rather inefficient bubblesort.  Since then I've also learned quicksort, shellsort, selection sort, insertion sort, and radix sort.  I particularly enjoy radix sort, it's fast and quite easy to code, which is absolutely wonderful.

However, the most interesting part of sorting is still efficiency, and this comes from a basic understanding of how a sort works (and I was surprised how easily this stuff can be picked up).  Basically, just count the amount of times something happens, and follow these rules (my own thing):
a) if the list gets cut down, the time is O(n) or lower
b) if the list has one runtime only, the time is O(1)
c) if every count means counting everything else as well, the time is probably O(n^2)

The hardest part about the concept is to find the worst possible time - which I thought at first was the longest possible time, although that's simply not true.  Since the second term test, I have had to redefine my understanding to: the worst possible time is by comparison to the other possible times for any given test.  As a result it's not about considering the longest possible list to sort, but also which cases would break the rules.  For example, the midterm had a question where the list would be cut to 1/10 the size, and then a search would be performed from there.  The answer deals with thinking about when a list can't be cut into 10 pieces - which would be for any list that is size 9 or lower.  In those cases, the entire list has to be searched, meaning the worst possible time is O(n).  (FOR A TA MARKING THIS PLEASE CORRECT ME IF I'M WRONG HERE COMMENT BELOW THANKS)

I feel that finding efficiency is incredibly important in many algorithms and I always try to take it into account when finding algorithms.  Using an algorithm with too long a runtime is just unwieldy and useless (which is also why I changed my regex p2 algorithm quite substantially to minimize the runtimes).  In the future I will try to learn more about runtimes and how to reduce them.



Tuesday 25 March 2014

Trees

There's a midterm tomorrow and I've decided to do this at like 1:30 in the morning.  WOO

I think it's really cool how much I can relate programming knowledge to the linguistics course I'm taking.

A slight background first - trees.  Trees are cool.  In programming, trees allow for quick order-based searching for just about any amount of data.  Take whatever, and just put it in some order.  I've already gone through several instances where trees can be used in past posts (like my prefix/infix/postfix post or my regex post) so I won't go in too much here.  However, I do appreciate the creation of the binary search tree, where everything on the left branch must be smaller and everything on the right must be larger than the node value.  Makes searching real easy.

In linguistics, instead, there is a definition for how to form a sentence in a binary tree (like) format.  We define any part of a sentence as a "phrase". Thus "dog" is a noun phrase.  "eat" is a verb phrase.  

A phrase X has two components - a specifier, and the main phrase part (X').  The main phrase part also has two components, the head (the main part), and the complement (or continuation).  For example, in the noun phrase "a dog", "a" is a specifier, and "dog" is the head of the phrase in the main phrase part.   If we wanted to continue this, we could say "the dog with the bone", thus: noun phrase with preposition phrase complement, preposition phrase has a noun phrase complement.  

This is sorted into trees.  Using notation of Value(Leftbranch, Rightbranch), "the dog with the bone" would be: NP(the, N'(dog, PP(P'(with, NP(the, N'(bone)))))).  Now I have a phrase in binary tree form.  It's also cool how the binary search tree analogy works here.  The words that follow have a larger "value" than those preceding them due to their position in the phrase.  Thus searching just follows the right branch downward until the right position is found.  

There are more complicated concepts like moving wh- words, or inserting do into questions, but everything still manages to follow the same binary tree format like in programming.  How interesting.

Sunday 16 March 2014

Blogger's Block

So apparently instead of blogging by ranting about something, when you don't have anything to rant about, you can follow the Blogger's Block questions!

What's something new you learned in class?
- Well sorting is not really a new concept, it's taught in high school no problem.  Especially the simple sorts quicksort and selectionsort.  However, I did find the timing of sorts interesting.  I learned the hierarchy works based on whether a function is greater than the next in terms of increase rate, and that in computer terms, lg n is used (i.e. log base 2).  I want to learn more about, so I've heard, a concept called P-completion and NP-completion.  It has something to do with this?

What's something you enjoyed this week in class?
- My coffee.
- Kidding.  I enjoy anything new I learn.  Like above.

What was some achievement you did this week and how did it relate to compsci?
- So I run this contest called NACLO (a computational linguistics contest) and they had a problem in their most recent round about free state automatons (transducers).  Whereas I cannot discuss the problem, I can relate the concept to the assigment 2 part 2 "is_regex" function.
My solution to approaching the problem was to figure out, in FSA format, how to generate a regex string that makes sense.
Define value = ['0','1','2','e'] and bin_op = ['|', '.'].
Like creating a FSA, you can only move from some states to another.
( -> ( or value.
value -> bin_op, *, end, )
bin_op -> value, (
* -> ), end, bin_op
) -> *, bin_op, ), end
start -> (, value
This would eliminate most results.  Then there are just a few more rules to put in.
( -> ____ bin_op ____ -> )
start + -> (, value
start + value -> end or *
bin_op + value -> *, )
In any case, I just found it interesting how the contest I was proctoring and the problem I was working on both had the same concept.   Also the program worked.  Yay.

Yeah - hopefully I'll be back to ranting soon.   Probably about sorting.

Monday 10 March 2014

I hate my Internet

So I'm late again because my internet was out for Friday-Sunday, which is a full testament to my ability to plan things ahead of time and do things on time.  Wonderful.

So I am really mad at my Assignment 2 because I messed up twice.  Firstly my string parser doesn't work to its full capability and I also messed up an accessor.  I think I'll leave it to the markers to figure out where those happened and PLEASE DON'T TAKE OFF TOO MANY MARKS I'M BEGGING YOU

I also learned (mostly because I didn't take csc108) about what tuples are after my friend figured out that the best way to make a list immutable is just to use a tuple - which is an immutable list.  HA.  wow.

Regular expressions are something I tend to deal with in terms of creation of sentences and words.  There's a bunch of sorting stuff you can do with regular expressions that I still have to get better at (esp. in this one puzzle called Regex Golf at http://regex.alf.nu/)

That being said, I'm a lot more optimistic about Assignment 2 Part 2 - because it means I don't have to get part 1 right to get this part right!  And I KNOW I messed up part 1.  rgurhgrughurghrugh  Processing regexes should be a lot more fun.

Maybe I'll even have a proper blog post next week!

Tuesday 4 March 2014

Recursion

Oops.  I got caught up enough working on Assignment 2 that I forgot to do this.  *smiles innocently*

Recursion (i.e. the ability to recall the function that is currently processing on a different input) is great in terms of its efficiency in breaking down a problem.  We see recursion everywhere because most tasks are basically menial repetitions of other tasks.

Of course, in lecture, the prof keeps talking about a "recursive insight", and whereas I think that it's more like "essentials to problem solving", it's necessary to know how to break down a problem and where you can start to use the same function over again.   For example, the Towers of Hanoi problem was clear in that since with 4 towers the procedure is to run a solve on 4 towers for i blocks, then 3 towers for n-i blocks, then 4 towers for i blocks again, the 4 towers procedure would have to be recursive as it would call itself for the first and third steps.

Now that we're learning about trees, we see that the same structure appears over and over again, which then allows us to recursively parse trees in their prefix/infix/postfix format.  Looking at a prefix example like last time:
+ 2 * 3 4
We actually just define a function that parses things like trees.  Here the head is the operation and the nodes are the arguments (values).
Given a function eval: Store operator as head.  Store value as node.  If operator found before two nodes full, eval(operator).  When 2 nodes appear, use operator on two nodes and store as node.
This becomes recursive when we realize that shit, what if we find something else that needs to be evaluated first, like the *34?

I find the hardest thing about recursion is probably the implementation.  What looks good in pseudocode usually sucks in actual programming - like that function just now.  Looks great, but looping through inputs isn't actually completely possible.  I guess with a string you could just say
String equation = '+2*34'
then the evaluation function would look like
operator = equation[1]
if equation[2] == '+' | '-' | '*' | '/' | etc. etc. so on so forth:
     String temp_eq = copy of equation from +
     node1 = eval(temp_eq)
or something like that, but that really looks nasty in code.

So yeah.  I need to find something else to talk about next week.

Monday 17 February 2014

Prefix/Infix/Postfix

Why can't we all use prefix or postfix notation and get rid of those silly brackets?  I think the answer is people don't like spaces (and/or people have messy handwriting).

Brackets are annoying and quite confusing to read especially when there are a lot of them, and tracing which bracket goes where is even worse.  Consider:

((3*x) + 2) * ((4 / (sqrt((2*y) - (3*z)) - (((((3*4) + 2) - (17*x)) + (y^2)) - 14))) + 5)
 
All the brackets and if you mess that up, retracing brackets is annoying.   
Then we came up with putting the operation before or after the arguments (prefix and postfix respectively), thus doing away with brackets entirely.
Here's postfix: 
3 x * 2 + 4 2 y * 3 z * - sqrt 3 4 * 2 + 1 17 x * - y 2 ^ + 14 - 5 + - / *

Not only is it shorter but also much cleaner.  Every argument is clearly defined (by spacing) and there can be no confusion about what is added to what and what isn't.  Reading left to right, you store arguments as they come, and every operation acts on the last two arguments processed.  Done.  

Prefix: 
* + * 3 x 2 / 4 sqrt - * 2 y * 3 z + - + - + * 3 4 2 * 17 x ^ y 2 14 5 
Read left to right and every operation is stored and when 2 arguments come up, use the first operation on the stack on them.  Also much easier.

I generally like postfix better because storing arguments makes more sense in terms of that an equation processes values and should thus output values instead of operations.  

Either way, it's obvious that this notation doesn't take nearly as much space or effort in processing - so what the hell are we still doing with that silly infix notation?  Spaces.  People don't like putting in spaces and people couldn't be bothered to make their writing look neat.

Silly people.

Sunday 9 February 2014

Towers of Hanoi

I wanted to rant about recursion but apparently I have to save my pent-up anger about that until week 7.

Let's talk about that damn assignment Towers of Anne Hoy (or Reve's Puzzle).

It was pretty quick to design a fast algorithm (called Hanoi4):
For 1 cheese, just move it.
For 2 cheeses, operate move one to an intermediary, move the second, move the first back.
Then for n>2 cheeses, recursive, operate the 2 cheeses method, recursive again.

I found the optimal move counts up until ~20 cheeses on the internet, and found that this quickly-designed, rough algorithm only displayed the optimal number of moves up to about 5 cheeses.  Not very effective.

Refining the algorithm was not very difficult when I decided to actually look at the method stated in the problem: actually using the original code for the Towers of Hanoi in the algorithm.  Thus it was simple to perform:
Hanoi3 - if 1 cheese, move straight to destination
              else Hanoi3 (cheeses - 1) to an intermediary, move from source to destination, Hanoi3 again.
Hanoi4 - if 1 cheese, move straight to destination
              else Hanoi4 (cheeses/2) to an intermediary, Hanoi3(rest of cheeses on that stack), Hanoi4 again.
This solution displayed optimal move counts up to about 8 cheeses, but upon getting 9 cheeses, the solution would start being non-optimal again.

It was obvious that the proper number of cheeses to move to the intermediary, i, was not half the number of cheeses.  This prompted me to re-read the question once more for the clue.  Turns out it is ABSOLUTELY NECESSARY TO HAVE A FULL UNDERSTANDING OF THE PROBLEM BECAUSE THE ANSWER IS STARING YOU IN THE FACE.

FRAAAAAAAA-

How do you find the optimal solution?  Use the optimal moves.  How do you find the optimal number of moves? IT'S GIVEN TO YOU.

Then having found the optimal number of moves using an algorithm it's simple to find to optimal i number.  This turned out to be:
1,1,2,2,2,3,3,3,3,4,4,4,4,4...

Doesn't even take a cryptographer to figure out the pattern on that one.

FRAAAAAAA-