INDEX
Explanations
references to specific numeric or coded values in diverse contexts
New Auto-Interp
Negative Logits
â̦↵↵
-0.15
â̦)↵↵
-0.15
illard
-0.14
neau
-0.14
rint
-0.14
âĢĬ
-0.14
Geb
-0.13
virt
-0.13
“â̦
-0.13
á¿¶
-0.13
POSITIVE LOGITS
Bul
0.23
Iraq
0.20
War
0.20
--↵
0.19
Ava
0.18
grasp
0.17
Mosul
0.17
--
0.17
Iraqi
0.17
'--
0.16
Activations Density 0.001%