Skip to content

Commit ccafe5a

Browse files
committedMar 11, 2018
fixes #10 initial docs for encoding standard
1 parent ec9fc5c commit ccafe5a

File tree

2 files changed

+103
-25
lines changed

2 files changed

+103
-25
lines changed
 

‎README.md

+19-25
Original file line numberDiff line numberDiff line change
@@ -2,23 +2,6 @@
22

33
Ecoji encodes data as emojis. As a bonus, includes code to decode emojis to original data.
44

5-
## Build instructions.
6-
7-
This is my first Go project, I am starting to get my bearings. If you are new
8-
to Go I would recommend this [video] and the [tour].
9-
10-
```bash
11-
# The following are general Go setup instructions. Ignore if you know Go, I am new to it.
12-
export GOPATH=~/go
13-
export PATH=$GOPATH/bin:$PATH
14-
15-
# This will download Ecoji to $GOPATH/src
16-
go get github.com/keith-turner/ecoji
17-
18-
# This will build the ecoji command and put it in $GOPATH/bin
19-
go install github.com/keith-turner/ecoji/cmd/ecoji
20-
```
21-
225
## Examples of running
236

247
Encode example :
@@ -97,20 +80,31 @@ Options:
9780
-v, --version Print version information.
9881
```
9982
100-
## Library
83+
## Build instructions.
10184
102-
Ecoji offers a Go library package with two functions `ecoji.Encode()` and `ecoji.Decode()`.
85+
This is my first Go project, I am starting to get my bearings. If you are new
86+
to Go I would recommend this [video] and the [tour].
10387
104-
## Technical details
88+
```bash
89+
# The following are general Go setup instructions. Ignore if you know Go, I am new to it.
90+
export GOPATH=~/go
91+
export PATH=$GOPATH/bin:$PATH
92+
93+
# This will download Ecoji to $GOPATH/src
94+
go get github.com/keith-turner/ecoji
10595
106-
Encoding works by repeatedly reading 10 bits from the input. Every 10 bit
107-
integer has a unique [Unicode emoji][emoji] character assigned to it. So for
108-
each 10 bit integer, its assigned emoji is output as utf8. To decode, this
109-
process is reversed.
96+
# This will build the ecoji command and put it in $GOPATH/bin
97+
go install github.com/keith-turner/ecoji/cmd/ecoji
98+
```
11099
111-
Ecoji is base1024 using a subset of emojis as its numerals.
100+
## Libraries
112101
102+
Libraries [implementing](docs/encoding.md) the Ecoji encoding standard. Submit PR to add a library to the table.
113103
104+
| Language | Link | Comments
105+
|----------|------|----------
106+
| Go | | This repository offers a Go library package with two functions [ecoji.Encode()](encode.go) and [ecoji.Decode()](decode.go).
107+
| Java | | Coming soon, I plan to implement this and publish to maven central unless someone else does.
114108
115109
[emoji]: https://unicode.org/emoji/
116110
[video]: https://www.youtube.com/watch?v=XCsL89YtqCs

‎docs/encoding.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Encoji encoding standard
2+
3+
Encoji maps input data into 1024 Unicode emojis (plus 5 padding emojis). Ten
4+
bits are needed to represent 1024. Ecoji reads 5 bytes at a time because this
5+
is 40 bits which is a multiple of 10. For each 5 bytes read, 4 emojis are
6+
output. When less than 5 bytes are available, special padding emojis are
7+
output. In [mapping.go](../mapping.go) the 1024 emojis and the padding emojis
8+
are defined. These same emojis should be used in other languages.
9+
10+
Below is some pseudo code for translating bytes to emojis. Also see [encode.go](../encode.go).
11+
12+
```java
13+
14+
Input input; //input data, read 5 bytes at a time from
15+
Output output; // where unicode emojis are written to
16+
byte data[5]; //buffer that bytes are read into
17+
int numRead;
18+
19+
//assumed this reads maximum available data up to five bytes
20+
while ((numRead = input.read(data)) > 0) {
21+
for(int i = numRead; i < 5; i++) {
22+
//zero out unread data
23+
data[i] = 0;
24+
}
25+
26+
switch (numRead) {
27+
case 1:
28+
output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
29+
output.writeUnicode(padding);
30+
output.writeUnicode(padding);
31+
output.writeUnicode(padding);
32+
break;
33+
case 2:
34+
output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
35+
output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
36+
output.writeUnicode(padding);
37+
output.writeUnicode(padding);
38+
break;
39+
case 3:
40+
output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
41+
output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
42+
output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
43+
output.writeUnicode(padding);
44+
break;
45+
case 4:
46+
output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
47+
output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
48+
output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
49+
50+
//look at last two bits of 4th byte to determine padding to use
51+
switch (data[3] & 0x03) {
52+
case 0:
53+
output.writeUnicode(padding40);
54+
break;
55+
case 1:
56+
output.writeUnicode(padding41);
57+
break;
58+
case 2:
59+
output.writeUnicode(padding42);
60+
break;
61+
case 3:
62+
output.writeUnicode(padding43);
63+
break;
64+
}
65+
break;
66+
67+
case 5:
68+
// use 8 bits from 1st byte and 2 bits from 2nd byte to lookup emoji
69+
output.writeUnicode(emojis[data[0]<<2 | data[1]>>6]);
70+
// use 6 bits from 2nd byte and 4 bits from 3rd byte to lookup emoji
71+
output.writeUnicode(emojis[(data[1] & 0x3f)<<4 | data[2]>>4]);
72+
// use 4 bits from 3rd byte and 6 bits from 4th byte to lookup emoji
73+
output.writeUnicode(emojis[(data[2] & 0x0f)<<6 | data[3]>>2]);
74+
//user 2 bits from 4th byte and 8 bits from 5th byte to lookup emoji
75+
output.writeUnicode(emojis[(data[3] & 0x03)<<8 | data[4]]);
76+
break;
77+
}
78+
}
79+
80+
```
81+
82+
For decoding, see [decode.go](../decode.go). The code needs to be cleaned up, it was written while learning Go.
83+
84+

0 commit comments

Comments
 (0)
Please sign in to comment.