INDEX
Explanations
references to interpersonal relationships and interactions
New Auto-Interp
Negative Logits
juan
-0.15
311
-0.15
arsi
-0.14
Cres
-0.14
еÑģп
-0.14
aris
-0.14
924
-0.14
rof
-0.13
ulton
-0.13
331
-0.13
POSITIVE LOGITS
/us
0.17
Ñĥда
0.16
zelf
0.14
åĢij
0.14
into
0.14
اÙĦÛĮ
0.14
ÙĬاÙĨ
0.13
instein
0.13
liches
0.13
ALI
0.13
Activations Density 0.183%