INDEX
Explanations
words associated with emotional expressions
New Auto-Interp
Negative Logits
ney
-0.17
nya
-0.16
isset
-0.16
iti
-0.16
maf
-0.16
neys
-0.16
iet
-0.15
sel
-0.15
lor
-0.14
lu
-0.14
POSITIVE LOGITS
ek
0.29
eting
0.25
ering
0.25
eming
0.24
eking
0.24
ez
0.24
evil
0.23
eper
0.23
enie
0.23
eks
0.23
Activations Density 0.087%