INDEX
Explanations
statements expressing surprise or strong emotion
New Auto-Interp
Negative Logits
vet
-0.15
ÑİÑī
-0.15
AC
-0.14
ç±į
-0.14
ä¹ĭä¸Ģ
-0.14
Woman
-0.14
Husband
-0.13
tiener
-0.13
Reply
-0.13
descendant
-0.13
POSITIVE LOGITS
folks
0.66
Fol
0.52
guys
0.52
ladies
0.50
fol
0.48
friends
0.42
Ladies
0.40
fol
0.40
folk
0.39
Guys
0.38
Activations Density 0.180%