fixes #10 initial docs for encoding standard

keith-turner · keith-turner · commit ccafe5a45bae · 2018-03-10T22:16:21.000-05:00
diff --git a/README.md b/README.md
@@ -2,23 +2,6 @@
 
 Ecoji encodes data as emojis.  As a bonus, includes code to decode emojis to original data. 
 
-## Build instructions.
-
-This is my first Go project, I am starting to get my bearings. If you are new
-to Go I would recommend this [video] and the [tour].
-
-```bash
-# The following are general Go setup instructions.  Ignore if you know Go, I am new to it.
-export GOPATH=~/go
-export PATH=$GOPATH/bin:$PATH
-
-# This will download Ecoji to $GOPATH/src
-go get github.com/keith-turner/ecoji
-
-# This will build the ecoji command and put it in $GOPATH/bin
-go install github.com/keith-turner/ecoji/cmd/ecoji
-```
-
 ## Examples of running
 
 Encode example :
@@ -97,20 +80,31 @@ Options:
     -v, --version         Print version information.
 ```
 
-## Library
+## Build instructions.
 
-Ecoji offers a Go library package with two functions `ecoji.Encode()` and `ecoji.Decode()`.
+This is my first Go project, I am starting to get my bearings. If you are new
+to Go I would recommend this [video] and the [tour].
 
-## Technical details
+```bash
+# The following are general Go setup instructions.  Ignore if you know Go, I am new to it.
+export GOPATH=~/go
+export PATH=$GOPATH/bin:$PATH
+
+# This will download Ecoji to $GOPATH/src
+go get github.com/keith-turner/ecoji
 
-Encoding works by repeatedly reading 10 bits from the input.  Every 10 bit
-integer has a unique [Unicode emoji][emoji] character assigned to it.  So for
-each 10 bit integer, its assigned emoji is output as utf8.  To decode, this
-process is reversed.
+# This will build the ecoji command and put it in $GOPATH/bin
+go install github.com/keith-turner/ecoji/cmd/ecoji
+```
 
-Ecoji is base1024 using a subset of emojis as its numerals.
+## Libraries
 
+Libraries [implementing](docs/encoding.md) the Ecoji encoding standard. Submit PR to add a library to the table. 
 
+| Language | Link | Comments
+|----------|------|----------
+| Go       |      | This repository offers a Go library package with two functions [ecoji.Encode()](encode.go) and [ecoji.Decode()](decode.go).
+| Java     |      | Coming soon, I plan to implement this and publish to maven central unless someone else does.
 
 [emoji]: https://unicode.org/emoji/
 [video]: https://www.youtube.com/watch?v=XCsL89YtqCs
diff --git a/docs/encoding.md b/docs/encoding.md
@@ -0,0 +1,84 @@
+# Encoji encoding standard
+
+Encoji maps input data into 1024 Unicode emojis (plus 5 padding emojis).  Ten
+bits are needed to represent 1024. Ecoji reads 5 bytes at a time because this
+is 40 bits which is a multiple of 10.  For each 5 bytes read, 4 emojis are
+output.  When less than 5 bytes are available, special padding emojis are
+output.  In [mapping.go](../mapping.go) the 1024 emojis and the padding emojis
+are defined.  These same emojis should be used in other languages.
+
+Below is some pseudo code for translating bytes to emojis.  Also see [encode.go](../encode.go).
+
+```java
+
+Input input; //input data, read 5 bytes at a time from
+Output output; // where unicode emojis are written to
+byte data[5]; //buffer that bytes are read into
+int numRead;
+
+//assumed this reads maximum available data up to five bytes
+while ((numRead = input.read(data)) > 0) {
+   for(int i = numRead; i < 5; i++) {
+     //zero out unread data
+     data[i] = 0;
+   }
+
+   switch (numRead) {
+      case 1:
+        output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
+        output.writeUnicode(padding);
+        output.writeUnicode(padding);
+        output.writeUnicode(padding);
+        break;
+      case 2:
+        output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
+        output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
+        output.writeUnicode(padding);
+        output.writeUnicode(padding);
+        break;
+      case 3:
+        output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
+        output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
+        output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
+        output.writeUnicode(padding);
+        break;
+      case 4:
+        output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
+        output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
+        output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
+        
+        //look at last two bits of 4th byte to determine padding to use
+        switch (data[3] & 0x03) {
+           case 0:
+             output.writeUnicode(padding40);
+             break;
+           case 1:
+             output.writeUnicode(padding41);
+             break;
+           case 2:
+             output.writeUnicode(padding42);
+             break;
+           case 3:
+             output.writeUnicode(padding43);
+             break;
+        }
+        break;
+
+      case 5:
+        // use 8 bits from 1st byte and 2 bits from 2nd byte to lookup emoji
+        output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
+        // use 6 bits from 2nd byte and 4 bits from 3rd byte to lookup emoji
+        output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
+        // use 4 bits from 3rd byte and 6 bits from 4th byte to lookup emoji
+        output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
+        //user 2 bits from 4th byte and 8 bits from 5th byte to lookup emoji
+        output.writeUnicode(emojis[(data[3] & 0x03)<<8 | data[4]]);
+        break;
+   }
+}
+
+```
+  
+For decoding, see [decode.go](../decode.go).  The code needs to be cleaned up, it was written while learning Go.
+
+