INDEX
Explanations
words related to challenges, difficulty, or intensity
references to the concept of difficulty or challenge
New Auto-Interp
Negative Logits
uality
-0.69
atern
-0.66
ership
-0.65
Burton
-0.65
rompt
-0.65
Spect
-0.64
itas
-0.63
TAG
-0.62
ulet
-0.62
Griffith
-0.62
POSITIVE LOGITS
hardest
1.21
iest
0.88
destro
0.86
imaginable
0.84
toughest
0.83
hitter
0.82
harder
0.81
entimes
0.75
easiest
0.73
darkest
0.72
Activations Density 0.003%