INDEX
Explanations
phrases indicating a state of existence or identity
New Auto-Interp
Negative Logits
ERA
-0.14
Williamson
-0.14
gle
-0.13
'er
-0.13
ávÄĽ
-0.13
apter
-0.13
loquent
-0.13
him
-0.13
alue
-0.12
aurant
-0.12
POSITIVE LOGITS
among
0.23
America
0.21
one
0.20
the
0.19
THE
0.17
amongst
0.17
Among
0.17
America
0.16
among
0.16
Europe
0.15
Activations Density 0.231%