INDEX
Explanations
actions related to harm and injury
New Auto-Interp
Negative Logits
arij
-0.59
AU
-0.57
LOT
-0.56
sourcing
-0.56
ibia
-0.56
acron
-0.55
ARE
-0.54
ajo
-0.54
Bei
-0.54
Horton
-0.54
POSITIVE LOGITS
by
1.09
ooter
0.76
otyp
0.75
ptin
0.74
By
0.72
ission
0.72
by
0.72
alive
0.71
elsewhere
0.71
entary
0.71
Activations Density 0.134%