INDEX
Explanations
words related to challenging or difficult situations
adjectives and adverbs that describe varying degrees of complexity, difficulty, or moral implications
New Auto-Interp
Negative Logits
pione
-0.63
oun
-0.61
aeper
-0.58
Citiz
-0.55
bryce
-0.54
earthqu
-0.54
ainted
-0.54
trave
-0.52
ij士
-0.50
ãĥĺãĥ©
-0.49
POSITIVE LOGITS
-)
0.91
)
0.85
,
0.83
,.
0.82
-.
0.78
--
0.74
,,
0.74
,-
0.73
--
0.73
)-
0.72
Activations Density 0.304%