INDEX
Explanations
words related to desire or aspiration
terms related to environmental conditions and perceptions of desirability or envy
New Auto-Interp
Negative Logits
Reviewer
-0.76
ammy
-0.71
ppo
-0.70
Gum
-0.67
butt
-0.64
head
-0.64
die
-0.63
Books
-0.63
ramid
-0.63
Arcane
-0.62
POSITIVE LOGITS
env
1.21
isions
0.99
env
0.89
ENTION
0.85
ours
0.84
igrated
0.83
oried
0.81
ancies
0.80
ounter
0.80
antly
0.77
Activations Density 0.006%