INDEX
Explanations
references to organizations or groups associated with political or military actions
New Auto-Interp
Negative Logits
Ab
-0.20
Aj
-0.19
abstraction
-0.17
Burst
-0.15
_asm
-0.15
assi
-0.15
AF
-0.15
:async
-0.14
illy
-0.14
axis
-0.14
POSITIVE LOGITS
ob
0.40
ab
0.39
abs
0.36
tab
0.33
able
0.33
ables
0.29
lab
0.29
аб
0.28
ub
0.27
ib
0.27
Activations Density 0.035%