The Hyper Logger

thin_wood_axThe other day while working at Chipmunks & Lumberjacks, I was asked to create a computer system to count the distinct set of leaf shapes, sizes and colors of every tree in the forest. I know what you are thinking! Why would anyone want to know this information? For C&L it’s really important, they use it when scoring whether a tree can be harvested.

thin_leaf_tree_forestThe naive implementation of this system would be to add all of the leaves into a database and do distinct queries. This system might work for a while but what happens when the total number of trees and their associated leaves climb into the billions or trillions? Will it still perform? The short answer is no; the longer answer is, absolutely not, are you kidding me!

Probabilistic Data Structures

So, what is a geek working for C&L to do? Enter the probabilistic data structure HyperLogLog. This sleek and beautiful data structure will tell you the distinct count of items entered into its structure with a 2% error rate. Ok now, come on, 2% is not that bad, especially when you can fit 10^9 items in just 1.5K of memory. HyperLog might be the best thing to come out of the 80’s besides Nintendo and the Sony Walkman.

leaf_tree_forestHyperLogLog data structures have an Add, Count and Merge function. Add will insert a new value into the data structure. Count will tell you the distinct number of items in the data structure. Merge will take two HyperLogLog data structures and return a new combined data structure with only the distinct set of the two data structures.
Add and Count are the meat and potatoes of the HyperLogLog data structure but don’t discount Merge. Some very interesting features can be created with Merge. For C&L we needed the 1, 7 and 30 day counts for our tree leaves. To save some space we just used one data structure per day and merged the days to find the week and month values. Neat, right?

Benefits

There are some significant side benefits of using this type of data structure over the typical row in a database. Three of them are GDPR compliance, easy replay and size.

maple_leaf_canada_treeFor GDPR, if you are just exchanging the data structures, there is no recognizable information stored inside the data structure due to the hashing of the original value and flipping a bit from 0 to 1 during the Add command.

Replay is all about recovering from a distributed system failure and that trying to insert the same value into a HyperLogLog data structure has no effect on the count. It’s basically a no-op.

You have to admit that storing a bazillion items in 1.5 kilobytes is pretty neat. The data size makes sharing and transmitting the structure between data-centers or servers in the same center relativity easy.

A Red Bird

bird_house_twitter

The new system named REDBird is built and deployed to multiple forests around the world and C&L is very happy with the new system. If you have a cardinality challenge at work, you might want to learn more about probabilistic data structures and HyperLogLog.

Learning Clojure

clojure-logoIt’s that time of year again at work. Code Freeze. The time of year where the code in production needs to be highly stable and predictable as opposed to the rest of the year where it needs to be highly stable and predictable 😜.  The code is not truly frozen, it’s kind of slushy but the benefit is that I can focus some of my extra energies into learning something new or something old like Lisp.

Yes, Lisp

Those of you who know me know that I had a ball learning from “The Land of Lisp” with it’s catchy show tune like videos and from “The Realm of Racket” book with all of its interactive games. My next fun coding book is going to be “Clojure for the Brave and True”. The book looks like a lot of fun and as a bonus I will get more exposure to Java and the JVM.

The learning plan is to work through the book during “code freeze” and hopefully do some fun side projects with Clojure. I always find it better to “use it in anger” to really learn a topic instead of just following a prescribed plan. I don’t have any idea what the side project will be yet but I think the book will give me some good ideas.

The Github repo has been created and I have my IDE and REPL set up and working. All that’s left is the fun part, reading and learning.

My Personal C# Style Guide

The other day I was asked at work to write a style guide for our C# practice. So what is a developer to do? Googling it was a good place to start, but none of the ones I found quite fit. Some were 80 pages long and made me nauseous and others completely lacked any detail.

I settled on a modified corefx style guide. So, what do you think? I will be adding to this guide in the future and wanted to make sure I didn’t misplace it again. It will now live on in blog form, at least for now.

1 – Use Allman style braces, where each brace begins on a new line. A single line statement block can go without braces but the block must be properly indented on its own line and it must not be nested in other statement blocks that use braces.

2 – Use four spaces of indentation (no tabs).

3 – Use camelCase for internal and private fields and use readonly where possible. When used on static fields, readonly should come after static (i.e. static readonly not readonly static).

4 – Avoid this. unless absolutely necessary.

5 – Always specify the visibility, even if it’s the default (i.e. private string foo not string foo). Visibility should be the first modifier (i.e. public abstract not abstract public).

6 – Namespace imports should be specified at the top of the file, outside of namespace declarations and should be sorted alphabetically.

7 – Avoid more than one empty line at any time. For example, do not have two blank lines between members of a type.

8 – Avoid spurious free spaces. For example avoid if (someVar == 0)..., where the dots mark the spurious free spaces. Consider enabling “View White Space (Ctrl+E, S)” if using Visual Studio, to aid detection.

9 – Do not commit large blocks of commented out code. Use the source code repository for this feature. All comment out code in an actively edited file should be removed. Please do not go through the code base deleting commented out code and check it in.

10 – Do not create an interface unless there are two or more non-test scenario implementations that use that interface.

11 – Within a class, struct, or interface, elements should be positioned in the following order:

  • Constants
  • Fields
  • Constructors
  • Finalizers (Destructors)
  • Delegates
  • Events
  • Enums
  • Interfaces
  • Properties
  • Indexers
  • Methods
  • Structs
  • Classes

static elements have to appear before instance elements.

Elements should be ordered by access:

  • public
  • internal
  • protected internal
  • protected
  • private

12 – Avoid putting multiple top-level classes/interfaces/enums in the same file.

13 – Avoid the use of regions in code unless it is surrounding auto-generated code. Do not use regions to separate the different types of class members.

14 – Use var when it’s obvious what the variable type is (i.e. var stream = new FileStream(...) not var stream = OpenStandardInput()).

15 – We use language keywords instead of BCL types (i.e. int, string, float instead of Int32, String, Single, etc) for both type references as well as method calls (i.e. int.Parse instead of Int32.Parse). See issue 391 for examples.

16 – We use PascalCasing to name all our constant local variables and fields. The only exception is for interop code where the constant value should exactly match the name and value of the code you are calling via interop.

17 – We use nameof(...) instead of "..." whenever possible and relevant.

18 – When including non-ASCII characters in the source code use Unicode escape sequences (\uXXXX) instead of literal characters. Literal non-ASCII characters occasionally get garbled by a tool or editor.

19 – If a file happens to differ in style from these guidelines, update the file if you are actively working on that file.

Use the .NET Codeformatter Tool to ensure a code base maintains a consistent style over time, the tool automatically fixes the code base to conform to the guidelines outlined above.

Example File:

ObservableLinkedList.cs:

using System;
using System.Collections;
using System.Collections.Generic;
using System.Collections.Specialized;
using System.ComponentModel;
using System.Diagnostics;
using Microsoft.Win32;

namespace System.Collections.Generic
{
    public partial class ObservableLinkedList : INotifyCollectionChanged, INotifyPropertyChanged
    {
        private ObservableLinkedListNode head;
        private int count;

        public ObservableLinkedList(IEnumerable items)
        {
            if (items == null)
                throw new ArgumentNullException(nameof(items));

            foreach (T item in items)
            {
                AddLast(item);
            }
        }

        public event NotifyCollectionChangedEventHandler CollectionChanged;

        public int Count
        {
            get { return count; }
        }

        public ObservableLinkedListNode AddLast(T value) 
        {
            var newNode = new LinkedListNode(this, value);

            InsertNodeBefore(head, node);
        }

        protected virtual void OnCollectionChanged(NotifyCollectionChangedEventArgs e)
        {
            NotifyCollectionChangedEventHandler handler = CollectionChanged;
            if (handler != null)
            {
                handler(this, e);
            }
        }
    }
}

This is a Visual Studio 2013 .vssettings file source for enabling C# auto-formatting conforming to the above guidelines. Note that rules 7 and 8 are not covered by the vssettings, since these are not rules currently supported by VS formatting.

<UserSettings>
    <ApplicationIdentity version="12.0"/>
    <ToolsOptions>
        <ToolsOptionsCategory name="TextEditor" RegisteredName="TextEditor">
            <ToolsOptionsSubCategory name="AllLanguages" RegisteredName="AllLanguages" PackageName="Text Management Package"/>
            <ToolsOptionsSubCategory name="CSharp" RegisteredName="CSharp" PackageName="Text Management Package">
                <PropertyValue name="TabSize">4</PropertyValue>
                <PropertyValue name="InsertTabs">false</PropertyValue>
                <PropertyValue name="IndentSize">4</PropertyValue>
                <PropertyValue name="BraceCompletion">true</PropertyValue>
            </ToolsOptionsSubCategory>
            <ToolsOptionsSubCategory name="CSharp-Specific" RegisteredName="CSharp-Specific" PackageName="Visual C# Language Service Package">
                <PropertyValue name="NewLines_QueryExpression_EachClause">1</PropertyValue>
                <PropertyValue name="Space_Normalize">0</PropertyValue>
                <PropertyValue name="Space_AroundBinaryOperator">1</PropertyValue>
                <PropertyValue name="Formatting_TriggerOnPaste">1</PropertyValue>
                <PropertyValue name="NewLines_Braces_Method">1</PropertyValue>
                <PropertyValue name="Indent_CaseLabels">1</PropertyValue>
                <PropertyValue name="Formatting_TriggerOnBlockCompletion">1</PropertyValue>
                <PropertyValue name="CodeDefinitionWindow_DocumentationComment_IndentOffset">2</PropertyValue>
                <PropertyValue name="NewLines_Braces_ControlFlow">1</PropertyValue>
                <PropertyValue name="NewLines_Braces_AnonymousMethod">0</PropertyValue>
                <PropertyValue name="Space_WithinOtherParentheses">0</PropertyValue>
                <PropertyValue name="Wrapping_KeepStatementsOnSingleLine">1</PropertyValue>
                <PropertyValue name="Space_AfterBasesColon">1</PropertyValue>
                <PropertyValue name="Indent_Braces">0</PropertyValue>
                <PropertyValue name="Wrapping_IgnoreSpacesAroundVariableDeclaration">0</PropertyValue>
                <PropertyValue name="Space_WithinMethodCallParentheses">0</PropertyValue>
                <PropertyValue name="Space_AfterCast">0</PropertyValue>
                <PropertyValue name="NewLines_Braces_CollectionInitializer">0</PropertyValue>
                <PropertyValue name="NewLines_AnonymousTypeInitializer_EachMember">1</PropertyValue>
                <PropertyValue name="NewLines_Keywords_Catch">1</PropertyValue>
                <PropertyValue name="NewLines_Braces_ObjectInitializer">0</PropertyValue>
                <PropertyValue name="NewLines_Braces_ArrayInitializer">0</PropertyValue>
                <PropertyValue name="Space_WithinExpressionParentheses">0</PropertyValue>
                <PropertyValue name="Space_InControlFlowConstruct">1</PropertyValue>
                <PropertyValue name="Formatting_TriggerOnStatementCompletion">0</PropertyValue>
                <PropertyValue name="NewLines_Keywords_Finally">1</PropertyValue>
                <PropertyValue name="Space_BetweenEmptyMethodDeclarationParentheses">0</PropertyValue>
                <PropertyValue name="Indent_UnindentLabels">0</PropertyValue>
                <PropertyValue name="NewLines_ObjectInitializer_EachMember">1</PropertyValue>
                <PropertyValue name="NewLines_Keywords_Else">1</PropertyValue>
                <PropertyValue name="Space_WithinMethodDeclarationParentheses">0</PropertyValue>
                <PropertyValue name="Space_BetweenEmptyMethodCallParentheses">0</PropertyValue>
                <PropertyValue name="Space_BeforeSemicolonsInForStatement">0</PropertyValue>
                <PropertyValue name="Space_BeforeComma">0</PropertyValue>
                <PropertyValue name="Space_AfterMethodCallName">0</PropertyValue>
                <PropertyValue name="Space_AfterComma">1</PropertyValue>
                <PropertyValue name="Wrapping_IgnoreSpacesAroundBinaryOperators">0</PropertyValue>
                <PropertyValue name="Space_BeforeBasesColon">1</PropertyValue>
                <PropertyValue name="Space_AfterMethodDeclarationName">0</PropertyValue>
                <PropertyValue name="Space_AfterDot">0</PropertyValue>
                <PropertyValue name="NewLines_Braces_Type">1</PropertyValue>
                <PropertyValue name="Space_AfterLambdaArrow">1</PropertyValue>
                <PropertyValue name="NewLines_Braces_LambdaExpressionBody">0</PropertyValue>
                <PropertyValue name="Space_WithinSquares">0</PropertyValue>
                <PropertyValue name="Space_BeforeLambdaArrow">1</PropertyValue>
                <PropertyValue name="NewLines_Braces_AnonymousTypeInitializer">0</PropertyValue>
                <PropertyValue name="Space_WithinCastParentheses">0</PropertyValue>
                <PropertyValue name="Space_AfterSemicolonsInForStatement">1</PropertyValue>
                <PropertyValue name="Indent_CaseContents">0</PropertyValue>
                <PropertyValue name="Indent_FlushLabelsLeft">1</PropertyValue>
                <PropertyValue name="Wrapping_PreserveSingleLine">1</PropertyValue>
                <PropertyValue name="Space_BetweenEmptySquares">0</PropertyValue>
                <PropertyValue name="Space_BeforeOpenSquare">0</PropertyValue>
                <PropertyValue name="Space_BeforeDot">0</PropertyValue>
                <PropertyValue name="Indent_BlockContents">1</PropertyValue>
                <PropertyValue name="SortUsings_PlaceSystemFirst">1</PropertyValue>
                <PropertyValue name="SortUsings">1</PropertyValue>
                <PropertyValue name="RemoveUnusedUsings">1</PropertyValue>
            </ToolsOptionsSubCategory>
        </ToolsOptionsCategory>
    </ToolsOptions>
</UserSettings>

Ruby and Docker

rubyI maintain the Ruby Client wrapper for my company’s REST API. I’m not exactly sure how this happened. I might have volunteered, or it could have been bad luck. Either way, it was time for a little bug fixing and as it so happens, I got a new laptop a couple of weeks ago. You know what that means; starting from scratch, again.

This time will be different. I’ll do it right this time, I tell myself. Maybe a VM, or a Vagrant file, or something hot like Docker.

Oooh, the new hotness, Docker; that would be fun.

I’ve been learning about Docker and Kubernetes for a bit so creating a development environment for some quick Ruby fixes would be fun and educational. Especially, since I already have Docker installed.

dockerWith a 5 line Dockerfile and a Ruby 2.2 base image, I was set and ready. A one line code bug fix, a version bump and our Ruby Gem was updated. So smooth it was impressive and exciting.

My Dockerfile

FROM ruby:2.2
RUN apt-get update && apt-get install -y build-essential
RUN mkdir -p /app
WORKDIR /app
RUN gem install bundler

With a docker build -t devtheruby . and a docker run -it -v $(pwd):/app devtheruby bash, I had my environment up and running. With the volume mount, I could edit the code from VS Code on the host and build and run tests in the connected terminal. Not a bad way to work.

Docker was really easy to set up and as a bonus, I added the Dockerfile to the Git repository for future me. I try to be nice to future me; he has to fix all my mistakes. For quick fixes, this seems like a good alternative to VM’s.  I’m just not sure if this is the way I would want to develop every day. I think I’m going to leave that up to future me.

Attendance with Beacons

This post is going to be a little different than my normal Xamarin.Forms posts. I want you all to know that I am more than a pretty interface. I have architectural and distributed systems chops as well.

I currently work for a local university that needs a solution for taking attendance for the first 3 weeks of every semester. You see, the university I work for is not an attendance taking university, but we need to satisfy a government requirement to take a student’s attendance to get financial aid. So I was asked to solve this little problem using mobile.

Here was the challenge

Students will walk into class and have their attendance taken automatically without the teachers involvement. Magically Delicious!

BeaconsBeaconRadar

To solve the problem, I combined some very cool technology and one of them, of course, is Xamarin and Xamarin.Forms, but it’s not the star of this blog post. The star is iBeacons, specifically Estimote Beacons.

If you don’t know what an iBeacon is, don’t worry. Think of them as tiny radio stations that transmit their signal over Bluetooth. The Bluetooth signal can range from a few feet to over 200 feet for each station.

The beacons broadcast a small amount of data every few seconds to receivers that are tuned to its specific address. Each classroom at the university will get its own beacon and it will broadcast the building and the room number of the classroom.

When a phone comes in contact with the beacons signal, it will then wake the Xamarin app up and give it 10 seconds to do its work. This works even if your app is not running. Neat!

Beacon Phone Home

Ok, now that the phone has come into range of our classroom radio station, what do we do now? For this application, the app transmits the beacon data and the student’s identity to the cloud. We call this message a beacon ping. To the cloud!

AzureBeaconCloudPS

The university’s student mobile application is backed by an Node.js Azure Mobile Service. I know, I use C# on the phone and JavaScript on the server, the irony!

cir-cloud

As you can imagine, we will have a lot of pings coming into the Azure front ends. Today most of that data will be used for attendance but we have no idea where the school will place beacons in the future. Maybe they’ll put them in front of the food venues to figure out which is the most popular. Or put them on the light posts outside to see what walkways are the busiest. We just don’t know what the genie will do once it’s let out of the bottle.

Beacon PubSub

With all the unknowns for our system, flexibility was something we had to build into the system from the beginning. I decided to use the Azure Message Bus in a PubSub configuration to handle today’s and tomorrow’s requirements. This configuration is very powerful and it’s so simple to setup it borders on criminal.

BeaconStreamTopology

Our day 1 configuration for the Beacon PubSub was for the Azure Mobile Service to take in pings from the handset and push them onto the “BeaconPing” topic inside of the Azure Message Bus. Two subscriptions are added to the topic, one for taking attendance and one for auditing.

Taking Attendance

cir-todo-list

Taking attendance is pretty simple. For every ping, we look up a student’s schedule based on the StudentId inside the ping and see if the beacon’s location matches the classroom on their schedule. If we have a match, then we mark them as attended. If not, we throw the ping away and move on. Students can easily come into contact with beacons from other classrooms so we just handle that case by ignoring the ping. The attendance module is idempotent so duplicates are not a problem.

Auditing

The audit subscription will subscribe to every ping and put the pings into Hadoop for future analysis. I have no doubt our computer science students will have some fun with that data in the years to come.

The Future

cir-pacman

One of the benefits to using PubSub like this is flexibility. With each ping, identifying the student and a location on campus, the opportunities are endless. One scenario we like to talk about is placing a beacon on our school mascot and running reports to see what part of the student body came in contact with our furry friend. I don’t know why we would want to do that, but we could. Beacons don’t have to be stationary either.

The Endless Possibilities

cir-rotate-clockwise

The one reason why I like this system and setup is flexibility and the fact that we don’t have to change the code on the mobile app when we add a new type of beacon or beacon processor. We just put more beacons on campus and another processor in the cloud and we’re done. See, like I said in the beginning of this post, I’m not just a pretty interface!

One thing to note here is that some students might not have a phone or wish to be tracked in this manner. Our students using the app have a choice to enable or disable this feature and we still have to have a manual way to taking attendance at the school. Because some student may have a Windows or Blackberry phone! 😉