INDEX
Explanations
references to specific individuals and their actions
New Auto-Interp
Negative Logits
uling
-0.15
outil
-0.14
(contents
-0.14
dÅĻev
-0.14
abb
-0.13
dust
-0.13
моб
-0.13
thinkable
-0.13
завиÑģим
-0.13
mobile
-0.13
POSITIVE LOGITS
conced
0.20
oct
0.19
concede
0.18
oct
0.18
attrib
0.15
dedic
0.15
attributed
0.14
surrendered
0.14
Reserve
0.14
vement
0.14
Activations Density 0.020%