INDEX
Explanations
references to Canadian historical events and their portrayal in society
New Auto-Interp
Negative Logits
ahat
-0.16
ãģıãģł
-0.15
çĵľ
-0.15
hel
-0.15
fats
-0.14
Moor
-0.14
hausen
-0.14
wrapping
-0.14
erox
-0.14
Voyager
-0.13
POSITIVE LOGITS
Dominion
0.28
Domin
0.24
domin
0.23
CPR
0.22
conf
0.21
Sir
0.20
Upper
0.20
Sir
0.19
fur
0.19
-cn
0.18
Activations Density 0.048%