INDEX
Explanations
words that express actions or conditions involving strong physical and emotional states
New Auto-Interp
Negative Logits
ey
-0.22
eeee
-0.22
eyer
-0.22
asure
-0.19
eee
-0.19
utenant
-0.17
BERS
-0.17
conut
-0.17
asures
-0.17
efa
-0.17
POSITIVE LOGITS
esign
0.41
orf
0.33
ragon
0.32
own
0.31
iamond
0.30
iction
0.29
irect
0.28
owns
0.28
uct
0.28
ed
0.28
Activations Density 0.108%