INDEX
Explanations
words related to physical or mental effort, challenge, or resistance
terms related to intensity and qualities of actions and experiences
New Auto-Interp
Negative Logits
poons
-0.71
avorite
-0.70
pilot
-0.66
ayne
-0.66
tsky
-0.64
guide
-0.63
team
-0.63
rule
-0.62
Annotations
-0.62
mite
-0.61
POSITIVE LOGITS
uously
1.44
ously
1.34
uous
1.20
uing
1.09
cially
1.01
ements
0.97
ued
0.94
uity
0.90
ues
0.87
quet
0.87
Activations Density 0.033%