INDEX
Explanations
actions and physical responses associated with surprise or shock
New Auto-Interp
Negative Logits
Sass
-0.16
878
-0.14
133
-0.14
chuck
-0.14
bw
-0.14
ICODE
-0.14
arel
-0.14
ury
-0.14
flag
-0.14
æ°
-0.14
POSITIVE LOGITS
WSC
0.16
ripp
0.15
stanov
0.15
noch
0.14
pornstar
0.14
buffs
0.14
offen
0.14
ovnÄĽ
0.14
صÙģ
0.14
FTA
0.14
Activations Density 0.361%