INDEX
Explanations
content related to news articles or statements from official sources
New Auto-Interp
Negative Logits
improv
-0.73
stump
-0.72
entimes
-0.70
temptation
-0.68
jugg
-0.67
crou
-0.66
charisma
-0.66
trivia
-0.66
experien
-0.65
fascination
-0.64
POSITIVE LOGITS
Statement
0.92
emphasis
0.90
itled
0.86
titled
0.81
lishes
0.81
redacted
0.80
idelines
0.79
stated
0.79
"...
0.79
letter
0.79
Activations Density 1.740%