INDEX
Explanations
emphatic statements regarding personal experiences and feelings
New Auto-Interp
Negative Logits
xico
-0.17
gaard
-0.16
iring
-0.15
umpt
-0.15
OTO
-0.15
адÑĥ
-0.14
aticon
-0.14
duk
-0.13
liest
-0.13
robe
-0.13
POSITIVE LOGITS
trust
0.62
Trust
0.55
Trust
0.54
trust
0.54
trusts
0.45
believe
0.44
Believe
0.43
çĽ¸ä¿¡
0.38
-bel
0.34
belief
0.34
Activations Density 0.212%