INDEX
Explanations
words related to emotional states and behaviors
New Auto-Interp
Negative Logits
y
-0.43
yat
-0.17
olest
-0.15
odal
-0.15
Hlav
-0.14
yah
-0.14
Gros
-0.14
Kab
-0.14
PointSize
-0.14
affer
-0.13
POSITIVE LOGITS
ym
0.39
ypass
0.39
ystate
0.39
ystack
0.38
ies
0.37
ysize
0.37
ys
0.37
yst
0.36
ypress
0.36
yp
0.35
Activations Density 0.053%