-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.xml
1407 lines (1003 loc) · 94.1 KB
/
index.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>awm's blog</title>
<link>https://ders.github.io/index.xml</link>
<description>Recent content on awm's blog</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Tue, 28 Feb 2017 15:51:00 +0900</lastBuildDate>
<atom:link href="https://ders.github.io/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>BigQuery</title>
<link>https://ders.github.io/post/2017-02-28-big-query/</link>
<pubDate>Tue, 28 Feb 2017 15:51:00 +0900</pubDate>
<guid>https://ders.github.io/post/2017-02-28-big-query/</guid>
<description>
<p>A recent project involves a script to do a regular data slurp, process it, and write the results to Google BigQuery. The script runs once an hour via cron.</p>
<p>The data slurp is such that my script always requests a specific range of data. Thus if processing fails at any point, I can easily re-slurp the same data and process it again.</p>
<p>On the BigQuery side, however, I need to ensure data integrity. I need to ensure that no data are lost, nor are any data inserted twice. This must be accomplished without ever querying the BigQuery tables, as queries are expensive.</p>
<p>The strategy then is to fail hard during the data slurp and processing phases, so that if something goes wrong, nothing goes into BigQuery, and we try again in an hour. This works well for recovering from the occasional communication errors encountered during the data slurp.</p>
<p>On the other hand, an error during the BigQuery insert phase must not fail hard, as that would leave us in an indeterminate state of having some of our data written. Instead, BigQuery inserts that fail should be retried and retried again until they succeed. (Of course I need to make sure that the failures we&rsquo;re retrying are transient, but that&rsquo;s a separate topic.)</p>
<h2 id="the-incident">The Incident</h2>
<p>Today in the log I found an &ldquo;unknown error&rdquo; entry, which means that something raised an exception in an unexpected place.</p>
<p>Inspecting the log file, I saw that one of the BigQuery insert calls had encountered a 500 (service temporarily unavailable) response. This was supposed to trigger an automatic retry, but the retry failed on account of one line of errant logging code. The script failed hard and marked the job as not done, even though several thousand rows had already made it into BigQuery.</p>
<p>On the next run an hour later, the script dutifully played catch-up, reprocessing the data that had gone astray and inserting it, this time successfully, into BigQuery.</p>
<p>So no data have been lost, but I&rsquo;ve failed at preventing duplication.</p>
<p>Fortunately, we have have a timestamp on every insert, so it should be a relatively simple matter to manually delete everything that was inserted at that particular hour.</p>
<p>So imagine my surprise and confusion when I discovered that there were exactly zero records timstamped in that range. The logger clearly showed several batches of 500 inserts successfully completed before the crash; where had all the records gone?</p>
<p>As it turns out, it&rsquo;s the <a href="https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency">insert ID</a> that saved us. Each data point is sent with a unique insert ID which is generated as a function of the data itself. When BigQuery received insert IDs that it had seen before, it silently deduped the data for us.</p>
<p>Two observations to note:</p>
<ul>
<li>The documentation states that BigQuery will remember the insert IDs for &ldquo;at least one minute.&rdquo; In our case, the duplicate data showed up an hour later and was still detected.</li>
<li>The deduping resulted in the earlier inserts being discarded and the later inserts being kept.</li>
</ul>
<p>I&rsquo;ve fixed the errant logging code, by the way.</p>
</description>
</item>
<item>
<title>A lean, clean Golang machine</title>
<link>https://ders.github.io/post/2016-12-23-lean-clean-golang-machine/</link>
<pubDate>Fri, 23 Dec 2016 14:49:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-12-23-lean-clean-golang-machine/</guid>
<description>
<p>Writing a <a href="https://golang.org/">Go</a> package that interacts with a relational
data store such as Postgres is full of messiness.</p>
<p>Those of us who appreciate the strong-typedness of Go probably also appreciate
the strong-typedness of SQL, and vice versa. Unfortunately, communication
between Go and SQL is less than ideal. This is due partly to the mostly
free-form text format of data exchange (queries) and partly to some subtle
differences in data types.</p>
<p>Database nulls are a particular headache, leading to the contortions of defining
types such <a href="https://golang.org/pkg/database/sql/#NullString">NullString</a>,
<a href="https://golang.org/pkg/database/sql/#NullInt64">NullInt64</a>, and
<a href="https://golang.org/pkg/database/sql/#NullBool">NullBool</a>, and an extra check
is required every time you want distinguish a null from a zero value.</p>
<p>Why not use an ORM? There has been
<a href="http://www.hydrogen18.com/blog/golang-orms-and-why-im-still-not-using-one.html">a lot written</a>
<a href="https://blog.codinghorror.com/object-relational-mapping-is-the-vietnam-of-computer-science/">on this already</a>,
but in a nutshell, the level of generality required means that
<a href="https://godoc.org/github.com/jinzhu/gorm">pretty much everything is an interface{}</a>
with runtime checks to cast stuff into the types you need, and at this point
we&rsquo;ve lost the benefits of Go&rsquo;s strong typing and may as well write our whole
application in Ruby.</p>
<p>I&rsquo;ve found that programmers who appreciate the power and control that comes from
writing in a low-level compiled language such as Go also appreciate the power
can control that comes from writing queries yourself in SQL.</p>
<h2 id="so-what-s-the-problem-really">So what&rsquo;s the problem, really?</h2>
<p>The real headache of <a href="https://golang.org/pkg/database/sql/">Go + SQL</a> is the
volume of boilerplate code that goes with even relatively simple operations.</p>
<p>(1) Run a query that doesn&rsquo;t return any results.</p>
<pre><code>_, err := db.Exec(query, ...args)
if err != nil {
return err
}
</code></pre>
<p>(1a) Run a query that doesn&rsquo;t return any results, but we want to know how many
rows were changed.</p>
<pre><code>res, err := db.Exec(query, ...args)
if err != nil {
return err
}
count, err := res.RowsAffected()
if err != nil {
return err
}
</code></pre>
<p>(1b) Run a query that doesn&rsquo;t return any results, and we&rsquo;d like to catch and
process integrity violations (e.g. duplicate entry on a unique field). This one
requires some database-specific code; the example here is for Postgres.</p>
<pre><code>_, err := db.Exec(query, ...args)
duplicate := false
if err != nil {
if pgerr, ok := err.(*pq.Error); ok {
duplicate = pgerr.Code.Class().Name() == &quot;integrity_constraint_violation&quot;
}
if !duplicate {
return err
}
}
</code></pre>
<p>(1c) Run a query that doesn&rsquo;t return any results, and we&rsquo;d like to catch and
process data exceptions (e.g. number out of range). This uses the same strategy as 1b and can be combined with it.</p>
<p>(2) Run a query that returns one row.</p>
<pre><code>err := db.QueryRow(query, ...args).Scan(&amp;arg1, &amp;arg2, ... )
if err != nil {
return err
}
</code></pre>
<p>(2a) Run a query that returns one row, and we&rsquo;d like to catch and process the
case where no rows are returned.</p>
<pre><code>err := db.QueryRow(query, ...args).Scan(&amp;arg1, &amp;arg2, ... )
noRows := err == ErrNoRows
if err != nil &amp;&amp; !noRows {
return err
}
</code></pre>
<p>(3) Run a query that returns multiple rows.</p>
<pre><code>rows, err := db.Query(query, ...args)
if err != nil {
return err
}
defer rows.Close()
for rows.Next() {
err := rows.Scan(&amp;arg1, &amp;arg2, ... )
if err != nil {
return err
}
}
err = rows.Err()
if err != nil {
return err
}
</code></pre>
<p>None of these is particularly bad as far as boilerplate goes, but unless we&rsquo;re
writing an ORM (and we&rsquo;ve already decided we&rsquo;re not), we&rsquo;re going to have tens,
perhaps hundreds of these scattered throughout our application. Add to that an
other <code>if err != nil</code> every time we start a transaction, and I&rsquo;m thinking
there&rsquo;s got to be a better way.</p>
<h2 id="organizing-database-access-around-high-level-functionality">Organizing database access around high-level functionality</h2>
<p>We would like to follow the
<a href="http://martinfowler.com/eaaCatalog/unitOfWork.html">unit of work</a>
pattern and create something akin to the
<a href="http://docs.sqlalchemy.org/en/latest/orm/session_basics.html">session model</a>
of SQLAlchemy.</p>
<p>A simple example of a unit of work is a password reset, which checks for an
email match, and then generates, saves, and returns a reset code. This will
involve a minimum of two queries, which need to be in the same transaction.
(Much more complicated units of work are possible, of course, both read-only
and read-write.)</p>
<p>Our goal then is to find a way to have just one copy of all the boilerplate above and be able to substitute queries and argument lists as needed.</p>
<p>I&rsquo;m going to propose that it&rsquo;s straightforward to implement such a thing Go
by defining a custom transaction handler which extends
<a href="https://golang.org/pkg/database/sql/#Tx">the one in database/sql</a>.
This is done within the package that uses it.</p>
<pre><code>type Tx struct {
sql.Tx
}
</code></pre>
<p>We extend <code>sql.Tx</code> with methods to (a) convert all database errors to panics so
that we can catch and process them all in one place, and (b) easily iterate over
result sets.</p>
<p>To accomplish (a), we add the methods <code>MustExec</code>, <code>MustQuery</code>, and <code>MustQueryRow</code>.
These are identical to <code>Exec</code>, <code>Query</code>, and <code>QueryRow</code> except that they panic
instead of returning an error code. Also, in the case of <code>MustQuery</code> and <code>MustQueryRow</code>,
they return custom <code>Rows</code> and <code>Row</code> objects that have similar extensions.</p>
<p>To accomplish (b), we add the method <code>Each</code> to the custom <code>Rows</code> object returned
by <code>MustQuery</code>. Method <code>Each</code> iterates over the result set and calls a
callback function for each row.</p>
<p>The <code>ourError</code> type is used to wrap errors that we want to convert back to error
codes. It distinguishes them from other kinds of panics (e.g. out of memory).</p>
<pre><code>type ourError struct {
err error
}
func (tx Tx) MustExec(query string, args ...interface{}) sql.Result {
res, err := tx.Exec(query, args...)
if err != nil {
panic(ourError{err})
}
return res
}
func (tx Tx) MustQuery(query string, args ...interface{}) *Rows {
rows, err := tx.Query(query, args...)
if err != nil {
panic(ourError{err})
}
return &amp;Rows{*rows}
}
func (tx Tx) MustQueryRow(query string, args ...interface{}) *Row {
row := tx.QueryRow(query, args...)
return &amp;Row{*row}
}
</code></pre>
<p>The custom <code>Row</code> and <code>Rows</code> types are defined analogously.
<code>Row</code> is extended with a <code>MustScan</code> method:</p>
<pre><code>type Row struct {
sql.Row
}
func (row Row) MustScan(args ...interface{}) {
err := row.Scan(args...)
if err != nil {
panic(ourError{err})
}
}
</code></pre>
<p><code>Rows</code> is extended with a <code>MustScan</code> method and also with the <code>Each</code> iterator
described above.</p>
<pre><code>type Rows struct {
sql.Rows
}
func (rows Rows) MustScan(args ...interface{}) {
err := rows.Scan(args...)
if err != nil {
panic(ourError{err})
}
}
func (rows *Rows) Each(f func(*Rows)) {
defer rows.Close()
for rows.Next() {
f(rows)
}
err := rows.Err()
if err != nil {
panic(ourError{err})
}
}
</code></pre>
<p>Now to make it all work, we define a custom transaction function. It
sets up the transaction, provides the custom transaction handler to our
callback, and then catches the panics.</p>
<pre><code>func Xaction(db *sql.DB, f func(*Tx)) (err error) {
var tx *sql.Tx
tx, err = db.Begin()
if err != nil {
return
}
defer func() {
if r := recover(); r != nil {
if ourerr, ok := r.(ourError); ok {
// This panic of from tx.Fail() or the equivalent. Unwrap it,
// process it, and return it as an error code.
tx.Rollback()
err = ourerr.err
if err == sql.ErrNoRows {
err = ErrDoesNotExist
} else if pgerr, ok := err.(*pq.Error); ok {
switch pgerr.Code.Class().Name() {
case &quot;data_exception&quot;:
err = ErrInvalidValue
case &quot;integrity_constraint_violation&quot;:
// This could be lots of things: foreign key violation,
// non-null constraint violation, etc., but we're generally
// checking those in advance. As long as our code is in
// order, unique constraints will be the only things we're
// actually relying on the database to check for us.
err = ErrDuplicate
}
}
} else {
// not our panic, so propagate it
panic(r)
}
}
}()
f(&amp;Tx{*tx}) // this runs the queries
tx.Commit()
return
}
</code></pre>
<p>This covers all of our boilerplate needs except for (1a) above.
To accommodate (1a), we could extend <code>sql.Result</code> the same way we extended
the others, but I haven&rsquo;t really needed it yet, so I&rsquo;ll leave it as an
exercise for the reader.</p>
<p>One final method that&rsquo;s there just to make everything neat and tidy is a <code>Fail</code>
method on the transaction which can be used to return an arbitrary error.</p>
<pre><code>func (tx Tx) Fail(err error) {
panic(ourError{err})
}
</code></pre>
<h2 id="the-result">The result</h2>
<p>Our application code is now a lot neater.</p>
<pre><code>err := Xaction(func(tx *Tx) {
// Run a query that doesn't return any results.
tx.MustExec(query1, ...args)
// Run a query that returns one row.
tx.MustQueryRow(query2, ...args).MustScan(&amp;arg1, &amp;arg2, ... )
// Run a query that returns multiple rows.
tx.MustQuery(query3, ...args).Each(func(r *Rows) {
r.MustScan(&amp;arg1, &amp;arg2, ... )
})
})
if err != nil {
switch err {
case ErrDoesNotExist:
// query2 returned no rows
case ErrInvalidValue:
// data exception
case ErrDuplicate:
// integrity violation
default:
return err
}
}
</code></pre>
<p>And since this is an extension to the stock transaction handler rather than
a replacement for it, we can still use the original non-must methods for
any edge case that might require a different kind of error handling.</p>
</description>
</item>
<item>
<title>CRUD APIs are crud</title>
<link>https://ders.github.io/post/2016-06-21-crud-is-crud/</link>
<pubDate>Tue, 21 Jun 2016 17:20:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-06-21-crud-is-crud/</guid>
<description>
<h2 id="crud-apis-are-crud">CRUD APIs are crud</h2>
<p>I&rsquo;m making the case specifically about <a href="http://web.archive.org/web/20130116005443/http://tomayko.com/writings/rest-to-my-wife">REST</a> APIs, but in fact everything here applies to any API, REST or not.</p>
<p>It&rsquo;s a common paradigm to create a data model as a collection of tables in a relational database and then access the data from some client app (mobile or web). <a href="https://en.wikipedia.org/wiki/Create,_read,_update_and_delete">CRUD</a> has become a popular way to access the data, perhaps because it&rsquo;s easy to make and easy to explain.</p>
<p>In CRUD, we&rsquo;re essentially giving the caller direct access to INSERT, SELECT, UPDATE and DELETE commands on our SQL database. Or something analogous if you&rsquo;re into NoSQL. It comes with some permissions checking, of course, but as far as the capabilities of the API, that&rsquo;s pretty much it.</p>
<p>The worst thing this does is expose the schema to the client, making it difficult to change the internal structure later on. Want to fix how tags are stored? Too bad, you&rsquo;re going to break the API.</p>
<p>Besides that, there&rsquo;s a lot of <a href="http://www.agiledata.org/essays/relationalDatabases.html#AdvancedFeatures">database capability</a> that&rsquo;s missing.</p>
<p>What happens when we have some business logic, e.g. in a stored procedure? We&rsquo;ll have to create a separate endpoint for that.</p>
<p>What happens when we have some limited resource that we need to allocate on a first-come, first-served basis, e.g. room reservations. Again, we need some special processing to ensure that only one of two simultaneous requests succeed.</p>
<p>What happens when we need some concept of transactions, that when a series of operations can&rsquo;t be completed we revert back to the original state? Once again, we need to handle this separately.</p>
<p>What happens when we need to enforce some consistency between tables? In the case of foreign key constraints, it&rsquo;s usually enough just to do the updates in the proper order, but other more complicated constraints will either need their own separate endpoints or will need to be momentarily violated. And being violated is never acceptable, even for just a moment.</p>
<p>The biggest problem with a CRUD API is that it&rsquo;s <a href="https://lostechies.com/chrispatterson/2014/01/03/crud-is-not-a-service/">shifting all the business logic to the caller</a>, whereas it should instead be invisible to the caller. Even Microsoft <a href="https://msdn.microsoft.com/en-us/library/ms954638.aspx#soade_topic3">recognized CRUD as an anti-pattern</a>, and that was way back in 2005. Even when we&rsquo;re only doing read and display, it&rsquo;s often necessary to make several API calls to produce one document, unnecessarily slowing down load times.</p>
<p>The second-biggest problem with a CRUD API is specific to the update operation. Update does not represent any realistic use case. When do you ever want to rewrite an entire database row? We carry this mistake all the way to the UI, where we press <code>edit</code> on our profile, get back all of our data in input fields, change one field, and then write everything back.</p>
<h2 id="apis-that-work">APIs that work</h2>
<p>I&rsquo;m proposing a way to approach APIs, a way that avoids the pitfalls of CRUD. If you&rsquo;re practicing <a href="http://dddcommunity.org/learning-ddd/what_is_ddd/">domain-driven design (DDD)</a>, this will happen naturally. (Side note: at our company, we&rsquo;ve been using DDD since day one, but no one here knew there was a buzzword for it.) None of what I&rsquo;m proposing is new or groundbreaking; it&rsquo;s just the way we should be doing things.</p>
<p>For read operations, there is one API call per display operation. Everything needed to render the requested view comes back as one bundle. Dynamic web content that&rsquo;s generated server-side is done this way, and the API can too. As a bonus, we can use the same API as internal for server-generated pages and as external for client-generated views.</p>
<p>For write operations, there is a one-to-one correspondence between a user action and an API call. On the backend, one API call is one transaction, and if any part fails, then the whole thing fails. (Side note: one should never, ever build a system where it&rsquo;s possible for only part of a user action to succeed. Usability nightmare.)</p>
<p>If we absolutely need some CRUD-style functionality (e.g. updating one&rsquo;s profile), we should make our updates one field at a time. Not only does this match more closely what the average user will be doing, but it gives us an easy way to manage concurrency: simply require an update call to specify both the old and new value. If the old value doesn&rsquo;t match, it&rsquo;s an error.</p>
<h2 id="tracking-changes-and-archiving">Tracking changes and archiving</h2>
<p>Tracking changes and archiving are two capabilities that are often added to a data store as an afterthought. I&rsquo;d like to be proactive and incorporate them into the data design from the beginning.</p>
<p>The simplest way to track changes is with created-at and updated-at fields on every db model, and most database engines have neat ways to auto-update these fields. This level of tracking is of limited use, however, as we don&rsquo;t know what changed or who changed it.</p>
<p>There are plenty of add-ons to do detailed revision tracking (<a href="https://django-reversion.readthedocs.io/en/stable/">django-reversion</a> is one I like), but I&rsquo;m a little bit concerned about the performance hit. Also, such add-ons make the created-at and updated-at fields redundant. That&rsquo;s probably a good thing.</p>
<p>As for archiving, a common technique is to add a boolean field called <code>archived</code> to every model you want to be able to archive. On this plus side, it&rsquo;s easy not to break references when you have non-archived data that refers to archived data, but <a href="https://en.wikipedia.org/wiki/Design_smell">we really shouldn&rsquo;t have that happening</a>. On the minus side, we end up adding <code>and not archived</code> to nearly every query.</p>
<p>We also might want to be able to permanently delete some archived material after a certain expiration time. We&rsquo;d then need an <code>archived_at</code> field as well.</p>
<p>Here&rsquo;s where CRUD fails again: Archive a record by setting <code>archived</code> to true and write it back. Unarchive it similarly. Determine the age of data by reading the created-at and updated-at fields on the model.</p>
<p>I propose that archiving and revision tracking can be implemented together in a way that&rsquo;s clean and transparent to the client.</p>
<p>Instead of adding extra fields to the models, all the archive and tracking information goes into a read/append-only journal, which may or may not be implemented as a database table.</p>
<p>The journal contains one entry for each user action (see above). If there are system actions (e.g. daily aggregations) that get written to the database, those get included as well. Each entry contains a before-and-after detail of all changes. Since this before-and-after detail will only ever be accessed as a whole, it&rsquo;s reasonable to make it one json bundle in a text field.</p>
<p>Archiving simply becomes a delete operation, as all the details are archived in the history. This means, of course, that related data needs to be archived together, which is a good thing. Furthermore, it&rsquo;s trivial to put a time limit on data retention; simply delete old journal entries.</p>
<p>My next API is going to rock.</p>
</description>
</item>
<item>
<title>The Django REST framework</title>
<link>https://ders.github.io/post/2016-04-14-django-rest-framework/</link>
<pubDate>Thu, 14 Apr 2016 10:22:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-04-14-django-rest-framework/</guid>
<description><p>I may have to reconsider choosing Go for some server applications.</p>
<p>There&rsquo;s a bit of a learning curve, but
version 3 of the <a href="http://www.django-rest-framework.org/">Django REST framework</a>
packs a lot of nice features.
The web browsable API is the one that won me over.</p>
</description>
</item>
<item>
<title>Why I code in Go for server applications</title>
<link>https://ders.github.io/post/2016-03-16-why-i-code-in-go/</link>
<pubDate>Wed, 16 Mar 2016 16:47:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-03-16-why-i-code-in-go/</guid>
<description><p>I&rsquo;ve written server applications in Ruby, Python, and Go. With Ruby I&rsquo;ve tried out both Sinatra and Rails; with Python I&rsquo;ve used Flask and Django; with Go I&rsquo;ve used the net/http package.</p>
<p>There are endless arguments for and against using this framework or that language, and there are many valid reasons to like or dislike a set of tools. I personally like Django a lot. But Go has two features that beat the competition when it comes to writing web services: static typing and explicit error handling.</p>
<p>In Ruby, we often find ourselves having to check if a value is nil before processing it. Anything can be nil, and unexpected inputs often create nil values where we least expect. If we forget to check just one place in the code, sooner or later it shows up as a 500 error and our service is <a href="https://www.youtube.com/watch?v=nZiDS-4Xd2k">broken</a>.</p>
<p>Test suites should cover this, but it&rsquo;s just as easy to miss one edge case in a test suite as it is to miss one in the main code.</p>
<p>In Go, nothing can be nil (unless it&rsquo;s a pointer, but it&rsquo;s easy to know when a pointer might not have been initialized). In the case of unexpected input, a variable is set to its zero value (e.g. 0, &#39;&#39;, {}), and the fact that there was unexpected input <a href="http://dave.cheney.net/2015/01/26/errors-and-exceptions-redux">is conveyed separately</a>.</p>
<p>In Python, we often find ourselves having to convert types, especially in the case of numerical inputs into string variables. Using a string where an int is required will raise a TypeError exception, and casting a non-numeric string to int will raise a ValueError exception. Here too, it&rsquo;s all too easy to miss one try-except block and get a 500 error.</p>
<p>Again, test suites should cover this, but that means a test for every possible branch in the code. Again, it&rsquo;s just as easy to miss one edge case in a test suite as it is to miss one in the main code.</p>
<p>In Go, compatible types are checked at compile time, thereby eliminating this source of errors.</p>
<p>I choose Go for the simple reason that most 500-inducing code bugs can be either caught at compile time or avoided entirely. The result is faster and more stable deployments than the alternatives.</p>
<p>OK, I lied. I choose Go because I like it. But this is a great way to justify my choice.</p>
</description>
</item>
<item>
<title>A simple sentiment analysis of two US presidential candidates</title>
<link>https://ders.github.io/post/2016-02-18-a-simple-sentiment-analysis/</link>
<pubDate>Wed, 03 Feb 2016 17:51:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-02-18-a-simple-sentiment-analysis/</guid>
<description>
<p><strong>Goal:</strong> To do some basic <a href="https://en.wikipedia.org/wiki/Sentiment_analysis">sentiment analysis</a> on video content.</p>
<p><strong>Test cases:</strong> Two 5-minute clips of US presidential candidate speeches.</p>
<p><strong>Strategy:</strong></p>
<ul>
<li>Extract 5 minutes of audio from the beginning of each video.</li>
<li>Generate a transcript using a speech-to-text program.</li>
<li>Feed the transcript into a sentiment analyzer.</li>
</ul>
<h2 id="the-original-content">The original content</h2>
<iframe width="560" height="315" src="https://www.youtube.com/embed/qOQCw7Hcwic" frameborder="0" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/p5ZB8Lg1tcA" frameborder="0" allowfullscreen></iframe>
<h2 id="extracting-a-5-minute-audio-clip">Extracting a 5-minute audio clip</h2>
<p>There are many ways to do this. One way is to download the video using a browser add-on. Browser add-ons are easy to find but are also fickle, as they make it easy to download material in violation of copyright. And if you&rsquo;re downloading from YouTube, you&rsquo;re violating their terms of service, even if you&rsquo;re not infringing on copyright. (We maintain that this exercise falls under <a href="https://en.wikipedia.org/wiki/Fair_use">fair use</a>.)</p>
<p>Another way is to turn on audio capture while playing the video.</p>
<p>After the capture is complete, we&rsquo;ll want to convert to FLAC if we&rsquo;re not there already. We&rsquo;ll use <a href="https://ffmpeg.org/">ffmpeg</a> for this, e.g.:</p>
<pre><code>ffmpeg -i captured-content.mp4 captured-content.flac
</code></pre>
<p>And then to extract the first 5 minutes:</p>
<pre><code>flac --until=5:00 captured-content.flac -o five-minute-clip.flac
</code></pre>
<p>Both <code>ffmpeg</code> and <code>flac</code> are available via homebrew.</p>
<h2 id="generating-a-transcript">Generating a transcript</h2>
<p>The IBM Watson Developer Cloud has a <a href="http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/speech-to-text.html">speech-to-text</a> service which is available through an API and also has a <a href="https://speech-to-text-demo.mybluemix.net/">demo page</a>. In theory, one can get limited free access to the API after going through a mildly annoying sign-up process, but in practice I was unable to convince the API to accept the credentials I&rsquo;d obtained.</p>
<p>Fortunately, the demo page allows file uploads and produced the following transcripts from the content above:</p>
<p><strong>Trump</strong></p>
<blockquote>
<p>This is no way I&rsquo;m leaving South Carolina. And I was gonna leave for tonight come back as it upsets up saying you have a five days we got a win on Saturday we&rsquo;re going to win. Make America great again we&rsquo;re gonna make America. We&rsquo;re going to win. You know. It&rsquo;s been an amazing friend Ruben all over the state today I love you too girly looking out dnmt love you I love you all. I love you. So many things are happening for a country. And it&rsquo;s this is a movement time magazine last week at the most beautiful story cover. And they talk about its improvement they&rsquo;ve never seen they say there&rsquo;s not been anything like this I don&rsquo;t know ever but they actually say ever. We went to Tampa the other day Tampa Florida would like two days notice fifteen thousand people that are turned away five thousand and by the way for all of the people in this room I can&rsquo;t believe it this is a huge room but downstairs to filling up another one and there sadly sending people away we don&rsquo;t like that right. No okay why don&rsquo;t we all get up go let&rsquo;s have that now. Now we have one of the great veterans outside I&rsquo;ll stand up while one of the great great you are great. Love this guy. He loves the veterans and I love the veterans are we going to take care of our veterans I&rsquo;ll tell you that we&rsquo;re going to take it did not. They are not properly taken care of so we&rsquo;re going to take a right we have sent a look at this. I knew you guys would say that I can spot a veteran a long ways off. But we are we going to take a break here we&rsquo;re gonna take you have a military we&rsquo;re going to take you have a military because our military is being whittled away whittled away we&rsquo;re going to make our military so big so strong so powerful nobody&rsquo;s going to mess with us anymore nobody nobody. Nobody. So Nikki Haley a very nice woman she better speech the other day you saw that and she was talking about anger and she said there&rsquo;s a lot of anger and I guess she was applying all of us you know really referring to us. And by the end of the day she was actually saying that Donald Trump is a friend of mine he&rsquo;s been a supporter of mine everything else you know the tone at the beginning was designed by the time that she was just barraged with people she said I think we better change our path here. And by the end of the day and it was fun it was great but she said you know there is anger but I said there is a group and I was asked during not this debate but the previous debate I was asked. I by the way did you love this last debate dnmt. Listen to like. They came at me from every angle possible. Don&rsquo;t know they came out before every angle you know sort of interesting. They were hitting me with things like and such until as you know I never realized I&rsquo;ve always don&rsquo;t politicians it is honest but I&rsquo;ve never known the level of dishonesty. And I deal in industries a lot of different but mostly real estate and like in Manhattan at different places but I&rsquo;ve never seen people as dishonest as politicians. They will say anything. Like okay so a lot of you people understand that you you get when you&rsquo;ve seen the speeches that you see in a lot of it and you know that I protect the second amendment more than anybody by far dnmt more than. And this guy Ted Cruz gets upset Donald Trump does not respect a second amendment and the more that anybody I&rsquo;m with the second amendment. I saw no no it&rsquo;s lies. And then they do commercials and you know he did it to Ben Carson and him in particular in all fairness. Jeff is represents but these are minor misrepresentations and he&rsquo;s not going anywhere anyway so what would how casual. Not as. Well Jeb was talking about eminent domain Donald Trump used eminent domain privately then I see there&rsquo;s a big story I had to bring this out luck. Proof Jeb bush under eminent domain took a disabled veterans property. Something about me. No state. Honestly these guys are these guys are the worst. Eminent domain without eminent domain by the way you don&rsquo;t up highways roads airports hospitals you know not bridges you drive anything so. They say Donald Trump does look like Eminem and I don&rsquo;t even tell me but you need to road you need a highway you need you know it&rsquo;s funny they all want they all want the keystone pipeline right but without eminent domain without think of it without eminent domain you can&rsquo;t have the keystone pipeline and we&rsquo;re going to get the keystone pipeline approved but but fluids jumps. It&rsquo;s jobs but remember this when it gets approved a politicians go to baby approve it.</p>
</blockquote>
<p><strong>Sanders</strong></p>
<blockquote>
<p>President Falwell and. David. Ok thank you very much for inviting my wife Jane and. Ought to be with you this morning we appreciate the invitation. Very much. And let me start off by acknowledging what I think. All of you already know. And that is the views. That many here at liberty university have. And all I. On a number of important issues. A very very different. I believe in women&rsquo;s rights dnmt. In the light of the woman to control her own body dnmt. I believe in gay rights. And now. Those of my views. And it is no secret. But I came here today. Because I believe from the bottom of my heart. That it is vitally important for those of us. Who hold different views. To be able to engage in any civil discourse. Who often in our country and I think both sides. Bear responsibility for us. There is too much shouting at each other. There is too much making fun of each other. Now in my view then are you can say this is somebody who whose voice is hoarse because I have given dozens of speeches. And the last few months it is easy. To go out and talk to people who agree with you are missing Greensboro North Carolina just last night. Alright. We are nine thousand people out. Mostly they agreed with me tonight. We&rsquo;re going to be a Manassas and thousands out they agree with me. It&rsquo;s not a whole lot to do. That&rsquo;s what politicians by and large do we go out and we talk to people who agree with us. But it is harder. But not less important. For us to try and communicate with those who do not agree with us on every issue. After. And it is important to see where if possible and I do believe it&rsquo;s possible we can find common grounds. No liberty university. Is a religious school obviously. Pn. All of you are proud of the. You already school. Which as all of us in our own way. Tries to understand the meaning of morality. What does it mean. To live a moral life. And you try to understand in this very complicated modern world that we live in. What the words of the Bible me in today&rsquo;s society. You are in school which tries to teach its students. How to behave with decency and with honesty and how you can best relates. To your fellow human beings and I applaud. You for trying to achieve those goals. Let me. Take a moment. Or a few moments. To tell you what motivates me. And the work that I do. As a public servant as a Sentinel. From the state of Vermont. And let me tell you that it goes without saying I am flaws foh throw me being a perfect human being. But all I am motivated by a vision.</p>
</blockquote>
<h2 id="sentiment-analysis">Sentiment Analysis</h2>
<p>Again, there are a variety of tools to do this, including the <a href="http://www.nltk.org/">Natural Language Toolkit Project</a>, a free python library. Taking advantage of a <a href="http://text-processing.com/demo/sentiment/">simple demo site</a> which uses the NLTK, we can see that both Sanders and Trump are polar, but Sanders is more positive. Who would&rsquo;ve known?</p>
<p><strong>Trump</strong></p>
<ul>
<li>Overall: negative</li>
<li>Subjectivity
<ul>
<li>neutral: 0.2</li>
<li><strong>polar: 0.8</strong></li>
</ul></li>
<li>Polarity
<ul>
<li>pos: 0.4</li>
<li><strong>neg: 0.6</strong></li>
</ul></li>
</ul>
<p><strong>Sanders</strong></p>
<ul>
<li>Overall: positive</li>
<li>Subjectivity
<ul>
<li>neutral: 0.2</li>
<li><strong>polar: 0.8</strong></li>
</ul></li>
<li>Polarity
<ul>
<li><strong>pos: 0.8</strong></li>
<li>neg: 0.2</li>
</ul></li>
</ul>
<p>For the adventuresome, <a href="http://www.nltk.org/howto/sentiment.html">here are more detailed instructions</a> on using the NLTK for sentiment analysis.</p>
</description>
</item>
<item>
<title>Using Grunt to Manage Static Assets</title>
<link>https://ders.github.io/post/2016-02-03-static-assets-with-grunt/</link>
<pubDate>Wed, 03 Feb 2016 17:51:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-02-03-static-assets-with-grunt/</guid>
<description>
<p>I <a href="https://ders.github.io/post/2016-01-14-static-assets-for-websites/">previously posted</a> about using GNU Make to manage front-end assets for a website. A colleague suggested that I should check out <a href="http://gruntjs.com/getting-started">Grunt</a> as it does everything I need to do and more. So here it is.</p>
<p>I have the same goals as I did last week:</p>
<ul>
<li>concatenate an arbitrary combination of js files, minifying them in the process</li>
<li>preprocess css with sass</li>
<li>copy directories i and lib untouched</li>
<li>run a watch process to update files as they&rsquo;re changed</li>
</ul>
<h2 id="installing-grunt">Installing grunt</h2>
<p>Grunt is part of the <a href="https://nodejs.org/">node.js</a> ecosystem, and as such is available via <a href="https://www.npmjs.com/">the node package manager (npm)</a>. Npm is available on OS X via Homebrew.</p>
<h3 id="basic-npm-concepts">Basic npm concepts</h3>
<p>There are a few things that we need to understand about npm. The biggest headache was recognizing the difference between local and global installs and knowing when to use which.</p>
<ul>
<li>Npm installs packages into a project (unless the <code>-g</code> global option is specified, more on that later) and needs to be run in project root. Packages then go into a subdirectory called <code>node_packages</code>.</li>
<li>If you&rsquo;re in some other directory when running npm, the packages will go into a <code>node_packages</code> subdirectory there and confuse you.</li>
<li>Npm expects to see a file called <code>package.json</code> in the project root directory and complains if it&rsquo;s not there.
<code>package.json</code> includes a list of packages that the project depends on, and the default <code>npm install</code> without any parameters installs those packages.</li>
<li>When installing a package explicitly, there is in an option to add an entry to <code>package.json</code> so that someone else will be able to use <code>npm install</code> and get everything. Note that this is an option and not the default behavior.</li>
</ul>
<h3 id="creating-the-package-json-file">Creating the package.json file</h3>
<p>According to <a href="http://gruntjs.com/getting-started#package.json">the documentation</a>,
the command to use is <code>npm init</code>, and it must be run in project root. Running it starts a dialog on the terminal, asking some mostly irrelevant questions: name (defaults to the name of the project directory), version (defaults to 1.0.0), description, entry point (defaults to index.js), test command, git repository, keywords, author, and license (defaults to ISC). These questions can be suppressed by using <code>npm init --yes</code>, which defaults everything.</p>
<p>Unfortunately, npm will complain if it doesn&rsquo;t see a description, a repository field and a license field. The defaults only cover the license field, leaving the description blank and the repository field missing altogether.</p>
<p>The minimum <code>package.json</code> has <a href="https://docs.npmjs.com/getting-started/using-a-package.json#requirements">just a name and a version</a>.
But since I&rsquo;m <a href="https://www.bignerdranch.com/blog/a-bit-on-warnings/">a stickler for getting rid of warnings</a>, I&rsquo;m going to have to create my own <code>package.json</code> that includes name, version, description, repository and license. None of this information is relevant; its only purpose is to make the warnings go away.</p>
<pre><code>{
&quot;name&quot;: &quot;taco&quot;,
&quot;version&quot;: &quot;1.0.0&quot;,
&quot;description&quot;: &quot;xyz&quot;,
&quot;repository&quot;: {
&quot;type&quot;: &quot;git&quot;,
&quot;url&quot;: &quot;xyz&quot;
},
&quot;license&quot;: &quot;ISC&quot;
}
</code></pre>
<p>Unfortunately there&rsquo;s one warning I can&rsquo;t get rid of. At the time of this writing, <code>npm install grunt</code> produces this:</p>
<pre><code>npm WARN deprecated [email protected]: lodash@&lt;2.0.0 is no longer maintained. Upgrade to lodash@^3.0.0
</code></pre>
<p>According to <a href="https://github.com/lodash/lodash/wiki/Changelog#v092">the changelog for lodash</a>,
version 0.9.2 was released in 2012, and the current version is 4.0.0. Even the &ldquo;upgrade to&rdquo; version of 3.0.0 is a year old already. This is a red flag; how and why are these dependencies not getting maintained? That said, it appears that <a href="https://github.com/gruntjs/grunt/issues/1419">an update is on the way</a>. Will have to ignore this warning for now.</p>
<h3 id="grunt-plugins">Grunt plugins</h3>
<p>Grunt itself is just the overlord; to do any real work we&rsquo;re going to need some plugins. After a lot of googling, I&rsquo;ve come up with this list:</p>
<ul>
<li>To minify and combine javascript files, we can use <code>grunt-contrib-uglify</code>.</li>
<li>To compile scss into css, we can use <code>grunt-contrib-sass</code>.</li>
<li>To copy directories, we can use <code>grunt-contrib-copy</code>.</li>
<li>To delete old files, we can use <code>grunt-contrib-clean</code>.</li>
<li>To watch for changes and recompile, we can use <code>grunt-contrib-watch</code>.</li>
</ul>
<p>All of these are <a href="http://gruntjs.com/plugins">marked as officially maintained</a>, giving us the warm, fuzzy feeling that everything is going to work.</p>
<p>We can now install grunt and the plugins.</p>
<pre><code>npm install grunt grunt-contrib-uglify grunt-contrib-sass grunt-contrib-copy grunt-contrib-clean grunt-contrib-watch --save-dev
</code></pre>
<h3 id="grunt-command-line">Grunt command line</h3>
<p>There is one more install required if we are to be able to run grunt from the command line. The package is <code>grunt-cli</code>, and needs to be installed globally so that the grunt executable goes into /usr/local/bin and is available in the system path.</p>
<p>npm install grunt-cli -g</p>
<p>It&rsquo;s possible to install <code>grunt-cli</code> in the project directory, but then the executable will be in node_modules/.bin instead of /usr/local/bin, and that makes more headaches for us</p>
<p>One gotcha is that the global grunt-cli requires a local grunt or it will fail. Grunt-cli is a wrapper to find the locally installed grunt to whatever project you&rsquo;re in. The global grunt-cli will not find a global grunt.</p>
<h3 id="summary-of-grunt-installation">Summary of grunt installation</h3>
<ul>
<li>Install npm (e.g. <code>brew install npm</code>).</li>
<li>Create the package.json file shown above.</li>
<li><code>npm install grunt grunt-contrib-uglify grunt-contrib-sass grunt-contrib-copy grunt-contrib-clean grunt-contrib-* watch --save-dev</code></li>
<li><code>npm install grunt-cli -g</code></li>
</ul>
<p><code>package.json</code> should go into source control, and <code>node_modules</code> should be excluded from source control with the appropriate entry in <code>.gitignore</code>.</p>
<p>Once we have <code>package.json</code> as updated by the npm install &ndash;save-dev command, steps 2 and 3 can be replaced by a simple <code>npm install</code>. We still need to keep step 4; global packages can&rsquo;t go into <code>package.json</code> (npm will ignore <code>--save-dev</code> when <code>-g</code> is specified).</p>
<h3 id="optionally-installing-grunt-cli-locally">Optionally installing grunt-cli locally</h3>
<p>Installing <code>grunt-cli</code> locally instead of globally will allow it to be included in <code>package.json</code>, but it has the side effect of not having the grunt executable in the path. A possible workaround to this side effect is to add a script section to <code>package.json</code> with all the grunts you want to do.</p>
<pre><code>&quot;scripts&quot;: { &quot;watch&quot;: &quot;grunt watch&quot; }
</code></pre>
<p>Then you can type <code>npm run watch</code> instead of <code>grunt watch</code>. This may or may not be worth the trouble.</p>
<h2 id="writing-a-gruntfile">Writing a gruntfile</h2>
<h3 id="basic-gruntfile-concept">Basic gruntfile concept</h3>
<p>The gruntfile is a bit of javascript initialization that gets run whenever grunt is invoked. The gruntfile needs to define an initialization function and assign that to the global <code>module.exports</code>. Within the initialization function, we&rsquo;ll need to list the modules we need (grunt-contrib-uglify, etc.), specify some configuration for each module, define the default task, and optionally define additional tasks.</p>
<p>Each plugin defines a task of the same name as the plugin (e.g. grunt-contrib-uglify defines an &ldquo;uglify&rdquo; task, under which any number of subtasks may be defined).</p>
<p>The gruntfile is named <code>Gruntfile.js</code> and resides in project root. The basic gruntfile structure is:</p>
<pre><code>module.exports = function(grunt) {
grunt.initConfig({
pluginname: { ... } // one of these for each plugin
};
grunt.loadNpmTasks( ... ); // one of these for each plugin
grunt.registerTask('default', ... ); // define the default behavior of `grunt` with no parameters
grunt.registerTask( ... ); // optional additional tasks
}
</code></pre>
<p>Each plugin defines a task of the same name as the plugin (e.g. grunt-contrib-uglify defines an &ldquo;uglify&rdquo; task, under which any number of subtasks may be defined). Defining additional tasks is useful for combining tasks into a single command.</p>
<p>A thorough read of <a href="http://gruntjs.com/getting-started">the docs</a> along with <a href="https://www.google.co.kr/search?q=gruntfile+examples">some examples</a> gives us enough information to build a single gruntfile, giving us the following commands:</p>
<ul>
<li><code>grunt</code> does a clean build, deleting <code>pub</code> if it exists and building everything from <code>src</code>.</li>
<li><code>grunt build</code> does an incremental build of js and css files, updating only those files whose source has changed.</li>
<li><code>grunt copy</code> syncs the directories <code>i</code> and <code>lib</code> from <code>src</code> to <code>pub</code>.</li>
<li><code>grunt watch</code> runs until you kill it, watching for changes in <code>src</code> and updating <code>pub</code> as necessary.</li>
</ul>
<p>Note that <code>grunt</code> is short for <code>grunt all</code>, which does <code>grunt clean</code> + <code>grunt copy</code> + <code>grunt build</code>.</p>
<script src="https://gist.github.com/ders/3ca946b14641e5efe783.js"></script>
<h3 id="observations">Observations</h3>
<ul>
<li>Overall, the quality of documentation is poor. I had to resort to copying examples and then modifying them by trial and error until I got the results I wanted. There are many alternate syntaxes, causing further confusion.</li>
<li>Could not find a way to do incremental updates with uglify. The entire js collection is rebuilt whenever any js source file changes.</li>
<li>The sass plugin depends on having command-line sass installed as a ruby gem, a dependency that I grudgingly accepted when writing the previous makefile and was hoping to avoid.</li>
<li>Dependencies from <code>@import</code> statements in scss source files are handled nicely; the dependencies are honored when doing an incremental build and don&rsquo;t need to be included in the gruntfile. This is nice.</li>
<li>The <code>grunt-contrib-copy</code> plugin doesn&rsquo;t know how to sync. The <code>i</code> and <code>lib</code> directories are copied in their entireties every time there&rsquo;s a change. There is <a href="https://github.com/tomusdrw/grunt-sync">another plugin</a> which claims to know how to sync, but I haven&rsquo;t tested it.</li>
</ul>
<h3 id="conclusion">Conclusion</h3>
<p>This was a whole lot of trouble to set up a relatively simple build system. Grunt is a powerful tool, and I can see the value of using it when you&rsquo;re already in a node-based project, but it to use it as an isolated build tool is not worth the effort.</p>
<p>The only thing we gained with Grunt is the ability to auto-detect imports in .scss files and do incremental updates accordingly. At the same time we lost the ability to incremental updates of the Javascript files, at least with the standard plugin.</p>
<p>I was also hoping to avoid the ruby sass dependency by using the plugin, but no luck there since the plugin is just a wrapper for the command line sass.</p>
</description>
</item>
<item>
<title>Static assets for websites</title>
<link>https://ders.github.io/post/2016-01-14-static-assets-for-websites/</link>
<pubDate>Thu, 14 Jan 2016 14:22:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-01-14-static-assets-for-websites/</guid>
<description>
<p>Count me in on the developers who believe that <a href="http://www.gnu.org/software/make/manual/make.html">GNU make</a> is the best tool for assembling static assets.</p>
<h3 id="the-general-problem">The general problem</h3>
<p>We need to maintain a set of files B that is derived from another set of files A through some known (and possibly complicated) transformation. We edit the files in set A but not in set B. We would like a simple way to (1) create B from A, and (2) update B when A changes, only recreating the parts that are necessary.</p>
<h3 id="the-more-specific-problem">The more specific problem</h3>
<p>B is the set of static assets for a web service, and A is the set of source files used to make them. Only A will be checked into source control, and only B will be uploaded to the web server.</p>
<p>There are different kinds of assets in A that need to be treated differently.</p>
<p><strong>Javascript</strong></p>
<ul>
<li><p>My Javascript source files are formatted nicely and full of meaningful, well-thought-out comments. I would like the js files sent with the web pages to be devoid of comments and mashed together so as to be almost unreadable. This can be accomplished by piping the files through <a href="http://www.crockford.com/javascript/jsmin.html">JSMin</a> on the way from A to B.</p></li>
<li><p>My Javascript source files are modular, and one page may need several files. These are best combined into one file for faster loading. Also, any source file could be included in several combination files. I would like the ability to have each js file in B created from an arbitrary combination of source files from A.</p></li>
</ul>
<p><strong>CSS</strong></p>
<ul>
<li>All my css is written as scss and needs to be processed with an scss compiler such as <a href="http://sass-lang.com/">Sass</a>. Scss files may import other sccs files, a fact we need to be aware of when detecting changes.</li>
</ul>
<p>Other assets such as images and precompiled libraries can be copied from A to B without modification.</p>
<h3 id="what-to-do">What to do</h3>
<p>The first thing is to define a directory structure.</p>
<p>For set A we&rsquo;ll make a subdirectory <code>src</code> in project root with four subdirectories: <code>js</code> for Javascript sources, <code>css</code> for scss sources, <code>i</code> for image files, and <code>lib</code> for precompiled libraries.</p>
<p>For set B we&rsquo;ll make a subdirectory <code>pub</code> in project root. Compiled js and css files will go directly in <code>pub</code>, and the two subdirectories <code>i</code> and <code>lib</code> will mirror <code>src/i</code> and <code>src/lib</code>.</p>
<pre><code>.
├── src
│ ├── js
│ ├── css
│ ├── i
│ └── lib
└── pub
├── i
└── lib
</code></pre>
<p>Next we need to make a list of the js and css files we would like generated and placed into <code>pub</code>. We&rsquo;ll do that by defining variables <code>JSFILES</code> and <code>CSSFILES</code>, e.g.:</p>
<pre><code>JSFILES := main.js eggs.js pancake.js
CSSFILES := blueberry.css yogurt.css
</code></pre>
<p>After that, we need to define the dependencies for each of these files, e.g.:</p>
<pre><code>pub/main.js: src/js/main.js
pub/eggs.js: src/js/eggs.js src/js/milk.js
pub/pancake.js: src/js/milk.js src/js/flour.js src/js/eggs.js
pub/blueberry.css: src/css/blueberry.scss src/css/fruit.scss
pub/yogurt.css: src/css/yogurt.scss
</code></pre>
<p>To simplify things, we&rsquo;ll define the default dependeny to be one source file of the same name, so we can omit dependency definitions for <code>main.js</code> and <code>yogurt.css</code>. We&rsquo;ll also define <code>JS := src/js</code>, <code>CSS := src/css</code> and <code>PUB := pub</code>.</p>
<pre><code>$(PUB)/eggs.js: $(JS)/eggs.js $(JS)/milk.js
$(PUB)/pancake.js: $(JS)/milk.js $(JS)/flour.js $(JS)/eggs.js
$(PUB)/blueberry.css: $(CSS)/blueberry.scss $(CSS)/fruit.scss
</code></pre>
<p>Finally, we need to make a list of directories to be copied directly from <code>src</code> to <code>pub</code>.</p>
<pre><code>COPYDIRS := lib i
</code></pre>
<p>This is now enough information for us to build a simple makefile, giving us (at least) the following commands:</p>
<ul>
<li><code>make</code> does a clean build, deleting <code>pub</code> if it exists and building everything from src.</li>
<li><code>make build</code> does an incremental build of js and css files, updating only those files whose source has changed.</li>
<li><code>make copy</code> syncs the directories <code>i</code> and <code>lib</code> from <code>src</code> to <code>pub</code>.</li>
<li><code>make watch</code> runs until you kill it, watching for changes in <code>src</code> and updating <code>pub</code> as necessary.</li>
</ul>
<p>Note that <code>make</code> is short for <code>make all</code>, which does <code>make clean</code> + <code>make copy</code> + <code>make build</code>.</p>
<script src="https://gist.github.com/ders/627147bf67544c96f8be.js"></script>
<h3 id="how-it-works">How it works</h3>
<p>The meat of this makefile is in the pattern rules (lines 43-55). Quick cheat sheet: <code>$@</code> = target, <code>$^</code> = all dependencies, <code>$&lt;</code> = the first dependency. <a href="http://www.gnu.org/software/make/manual/make.html#Automatic-Variables">Details are here.</a></p>
<p>The first rule takes care of <code>main.js</code> and <code>eggs.js</code>.</p>
<p>The second rule takes care of <code>pancake.js</code>. Note that <code>pancake.js</code> doesn&rsquo;t match the first rule because there is no source file called pancake.</p>
<p>The third rule takes care of <code>blueberry.css</code> and <code>yogurt.css</code>. Note that on line 55 <code>fruit.scss</code> is <strong>not</strong> supplied as an argument to sass. It&rsquo;s only listed as a dependency because <code>blueberry.scss</code> contains an <code>@import &quot;sass&quot;;</code> directive.</p>
<p>Finally, lines 32-36 take care of syncing directories <code>i</code> and <code>lib</code>.</p>
<p>In the end, our filesystem looks like this:</p>
<pre><code>.
├── src
│ ├── js
│ │ ├── eggs.js
│ │ ├── flour.js
│ │ ├── main.js
│ │ └── milk.js
│ ├── css
│ │ ├── blueberry.scss
│ │ ├── fruit.scss
│ │ └── yogurt.scss
│ ├── i
│ │ ├── hanjan.jpg
│ │ └── ikant.png
│ └── lib
│ └── MooTools-Core-1.5.2-compressed.js
└── pub
│ ├── i
│ │ ├── hanjan.jpg
│ │ └── ikant.png
│ ├── lib
│ │ └── MooTools-Core-1.5.2-compressed.js
│ ├── blueberry.css
│ ├── eggs.js
│ ├── main.js
│ ├── pancake.js
│ └── yogurt.css
└── Makefile
</code></pre>
<h3 id="dependencies">Dependencies</h3>
<p>This makefile requires <code>jsmin</code>, <code>sass</code> and <code>watchman-make</code> to be available at the command line.</p>
<p>Jsmin and <a href="https://facebook.github.io/watchman/docs/install.html">Watchman</a> (which includes watchman-make) are available on OS X via Homebrew. Sass is not (yet), but it can be installed as a system-wide ruby gem. I&rsquo;m not a fan of requiring rubygems for my decidedly anti-rails build system, but since Sass runs nicely from the command line I&rsquo;ll turn a blind eye for now.</p>
<p>Jsmin is also <a href="https://libraries.io/npm/jsmin">available via npm</a>.</p>
<h3 id="other-features-i-d-like-to-include">Other features I&rsquo;d like to include</h3>
<p>Would be nice to automatically detect @import statements in scss source files and generate dependency lists based on that. I&rsquo;m aware that the Sass package has it&rsquo;s own watcher that handles dependencies, but using that would mean bypassing a significant part of the makefile, thereby making a mess.</p>
<p>It would be pretty simple to add a <code>make deploy</code> command to rsync the server. I&rsquo;ll probably do that later.</p>
<h3 id="a-feature-i-excluded-on-purpose">A feature I excluded on purpose</h3>
<p>Many web frameworks automatically append timestamps or version numbers to static assets in order to defeat browser caching. This adds a whole lot of complexity for a pretty minor benefit. Once a site is in production, I expect updates to be few and far between, and I&rsquo;m happy to manually add a version number to a target filename as necessary.</p>
<h3 id="credits">Credits</h3>
<p>This Makefile was heavily influenced by and owes thanks to <a href="http://west.io/post/2015/04/11-frontend-builds-with-makefiles/">this blog post</a>. Thank you!</p>
</description>
</item>
<item>
<title>Google Sign-in</title>
<link>https://ders.github.io/post/2016-01-08-google-signin/</link>
<pubDate>Fri, 08 Jan 2016 15:50:00 +0900</pubDate>
<guid>https://ders.github.io/post/2016-01-08-google-signin/</guid>
<description>
<h2 id="using-google-sign-in-on-website-x">Using Google Sign-in on Website X</h2>
<p><strong>Disclaimer:</strong> <a href="https://developers.google.com/identity/sign-in/web/sign-in">Read the docs</a> too. This post doesn&rsquo;t cover everything.</p>
<p>A week ago I was completely clueless as to how Google sign-in works. I set out to write about it and learned a few things.</p>
<h3 id="overview">Overview</h3>
<p>Using Google sign-in on a website requires first doing the following in the <a href="https://console.developers.google.com/home/dashboard">Google developer&rsquo;s console</a>:</p>
<ul>
<li>creating a project</li>
<li>creating a sign-in client ID for that project</li>
<li>associating the domain(s) of the website with the sign-in client ID</li>
</ul>
<p>Sign-in is done using javascript on the web page to talk directly to Google&rsquo;s servers. The javascript is loaded from Google&rsquo;s servers. It is not necessary to involve the server for website X at all.</p>
<p>When Joe the Hacker attempts to sign in to website X, a popup dialog appears. The contents of the dialog depend on Joe&rsquo;s current signed-in state.</p>
<p>If Joe is not signed in to Google at all, then a sign-in dialog appears. If he&rsquo;s signed in to more than one account, then an account chooser dialog appears. If he&rsquo;s signed into exactly one account, then the sign-in part is skipped.</p>
<p>If this is the first time he&rsquo;s attempted to sign in to website X, then he&rsquo;ll be asked to give permission for website X to have access to his profile information (name, picture) and email address.</p>
<p>In the case that Joe needs neither the sign-in dialog nor the permissions dialog (i.e. he&rsquo;s already signed in to exactly one account and is a returning user), then the pop-up closes itself immediately without any user action.</p>
<p>The browser remembers that Joe is signed in to website X using Google sign-in. He can sign out of website X and still be signed into Google. However, if he signs out of Google, then he&rsquo;ll automatically be signed out of website X as well. He can&rsquo;t be signed in to website X using his Google ID and not also be signed in to Google.</p>
<p>If the webpage making the sign-in call is served from a domain that has not been registered in the console, then Joe will see a 400 error (redirect_uri_mismatch) and a picture of a broken robot when trying to sign in. The error page also exposes the email address of the account that the project is made under.</p>
<h3 id="javascript-details">Javascript details</h3>
<p>The file platform.js provides the global Google API object called <code>gapi</code> and the auth2 module. The auth2 module must be explicitly loaded into gapi with the <code>gapi.load</code> method before it&rsquo;s used. This method provides an optional callback for when/if the module is loaded successfully.</p>
<pre><code>gapi.load(&quot;auth2&quot;, callback);
</code></pre>
<p>Once the module is loaded, it must be initialized with the sign-in client ID (see above). The client ID may either be provided as <a href="https://developers.google.com/identity/sign-in/web/reference#gapiauth2initwzxhzdk20paramswzxhzdk21">an option to the init method</a> or in <a href="https://developers.google.com/identity/sign-in/web/sign-in#specify_your_apps_client_id">a meta tag in the document</a>. The init function returns a GoogleAuth object.</p>
<pre><code>gauth = gapi.auth2.init(options);
</code></pre>
<p>A logical initialization flow would be to have the initialization in the load callback.</p>
<pre><code>gapi.load(&quot;auth2&quot;, function() { gapi.auth2.init(); });
</code></pre>
<p>The GoogleAuth object may also be obtained any time after it&rsquo;s initialized using the <a href="https://developers.google.com/identity/sign-in/web/reference#gapiauth2getauthinstance"><code>getAuthInstance</code></a> method.</p>