INDEX
Explanations
specific numerical information, such as quantities or rankings
New Auto-Interp
Negative Logits
utics
-0.82
each
-0.72
strength
-0.71
their
-0.71
allah
-0.68
rey
-0.68
lag
-0.68
terness
-0.68
erved
-0.68
apt
-0.67
POSITIVE LOGITS
casualty
1.17
thing
1.11
reason
0.94
beneficiary
0.93
major
0.91
obstacle
0.88
culprit
0.87
installment
0.86
piece
0.86
exception
0.85
Activations Density 1.390%