INDEX
Explanations
instances of emphasis or qualifiers in statements
New Auto-Interp
Negative Logits
pee
-0.14
iere
-0.14
illance
-0.14
DED
-0.13
awy
-0.13
ASIC
-0.13
odal
-0.13
Wir
-0.13
igue
-0.13
ertools
-0.13
POSITIVE LOGITS
being
0.17
along
0.17
through
0.16
trick
0.16
having
0.15
in
0.15
arrison
0.15
reverse
0.15
had
0.15
flush
0.15
Activations Density 0.105%