INDEX
Explanations
references to nonverbal communication and body language
New Auto-Interp
Negative Logits
dera
-0.17
hea
-0.16
ienne
-0.15
cales
-0.15
awe
-0.15
generators
-0.14
rina
-0.14
ears
-0.14
onces
-0.14
gom
-0.14
POSITIVE LOGITS
å¥
0.14
ocking
0.14
ADOS
0.14
rust
0.14
æĶ¯
0.14
mary
0.14
Cir
0.14
RC
0.14
_RC
0.13
chas
0.13
Activations Density 0.212%