INDEX
Explanations
references to individuals taking action or needing to act
New Auto-Interp
Negative Logits
zin
-0.17
lej
-0.17
endoza
-0.15
culo
-0.15
à¤ĺ
-0.15
rome
-0.15
ĥģ
-0.15
ì¹´ëĿ¼
-0.15
spa
-0.14
ÄĻd
-0.14
POSITIVE LOGITS
else
0.19
oni
0.15
Schmidt
0.14
crack
0.14
somewhere
0.14
(s
0.14
Somebody
0.14
ze
0.14
call
0.13
whom
0.13
Activations Density 0.077%