INDEX
Explanations
phrases related to denial or disclaimers
statements expressing denial or lack of involvement
New Auto-Interp
Negative Logits
beware
-0.70
utenberg
-0.68
surprisingly
-0.68
hailed
-0.63
srfAttach
-0.61
plenty
-0.59
ismo
-0.59
surprisingly
-0.59
badass
-0.57
hilarious
-0.57
POSITIVE LOGITS
nor
1.52
whatsoever
1.44
anybody
1.19
anything
1.09
anymore
1.04
nor
1.00
â̦"
0.98
any
0.96
[
0.95
..."
0.94
Activations Density 0.536%