INDEX
Explanations
expressions of personal reactions and feelings
New Auto-Interp
Negative Logits
Flem
-0.16
ALSE
-0.16
оÑĢож
-0.15
regor
-0.15
avra
-0.15
Ã¥l
-0.15
amespace
-0.15
eyin
-0.15
MOTE
-0.15
emez
-0.15
POSITIVE LOGITS
aware
0.23
feel
0.23
proud
0.21
gig
0.20
uncomfortable
0.20
happy
0.20
into
0.20
fall
0.20
comfortable
0.20
sad
0.19
Activations Density 0.036%