INDEX
Explanations
threatening or confrontational language
aggressive language and threats
New Auto-Interp
Negative Logits
preceding
-0.75
quartered
-0.75
lacked
-0.70
opting
-0.69
unusually
-0.69
imilar
-0.69
pired
-0.68
strikingly
-0.68
seemingly
-0.68
overshadowed
-0.66
POSITIVE LOGITS
tomorrow
1.20
morrow
1.06
whoever
1.02
someday
1.00
!"
0.98
..."
0.96
ASAP
0.95
fuckin
0.93
â̦"
0.92
______
0.88
Activations Density 0.718%