INDEX
Explanations
references to dining experiences and social judgments related to character traits
New Auto-Interp
Negative Logits
sider
-0.16
eng
-0.15
atos
-0.15
onta
-0.15
ÙĦÙĤ
-0.14
Harvey
-0.14
504
-0.14
354
-0.14
outh
-0.14
omo
-0.13
POSITIVE LOGITS
adol
0.15
ìĦŃ
0.15
acin
0.15
AtPath
0.15
/fw
0.15
mun
0.14
å±Ĭ
0.14
Incontri
0.14
ExecutionContext
0.14
hevik
0.14
Activations Density 0.161%