INDEX
Explanations
references to the city of Waterloo or Wellington
the occurrences of the terms "Waterloo" and "Wellington"
New Auto-Interp
Negative Logits
ures
-0.85
manship
-0.80
ibli
-0.78
ively
-0.77
charact
-0.76
lishing
-0.74
nar
-0.74
ãģ¦
-0.74
es
-0.73
MENTS
-0.71
POSITIVE LOGITS
Waterloo
0.92
yip
0.83
ategory
0.83
Denis
0.82
Laur
0.82
agine
0.80
Wellington
0.79
renheit
0.79
oleon
0.78
ipop
0.76
Activations Density 0.024%