INDEX
Explanations
references to individuals or groups in various contexts
New Auto-Interp
Negative Logits
grav
-0.15
cons
-0.15
endor
-0.14
æª
-0.14
avanaugh
-0.14
wnd
-0.14
Stan
-0.14
812
-0.14
istra
-0.14
81
-0.13
POSITIVE LOGITS
hua
0.15
ORM
0.15
absolut
0.15
préc
0.14
CADE
0.14
elter
0.14
Rena
0.14
absol
0.14
AIT
0.14
renc
0.14
Activations Density 0.006%