INDEX
Explanations
phrases with social or political relevance
symbol sequences or non-standard characters
New Auto-Interp
Negative Logits
Downs
-0.71
Practices
-0.68
straw
-0.66
tremend
-0.64
decomp
-0.63
ãĥīãĥ©ãĤ´ãĥ³
-0.63
whistle
-0.62
dispers
-0.62
Pavilion
-0.62
assemb
-0.61
POSITIVE LOGITS
į
1.03
ı
0.99
¤
0.98
ł
0.97
Ķ
0.96
«
0.91
¶
0.90
Ĥ
0.90
¬
0.90
±
0.90
Activations Density 0.117%