INDEX
Explanations
phrases related to advocating for specific actions or beliefs
New Auto-Interp
Negative Logits
ciation
-0.71
doubtless
-0.70
PG
-0.64
inally
-0.64
eatures
-0.63
zzi
-0.62
nifty
-0.60
albeit
-0.60
ionics
-0.60
ector
-0.59
POSITIVE LOGITS
anymore
1.27
nor
1.02
any
0.95
unnecessarily
0.92
anything
0.88
blindly
0.88
rash
0.86
frivol
0.86
jeopard
0.85
disrespect
0.85
Activations Density 0.416%