INDEX
Explanations
references to personal connections and relationships
New Auto-Interp
Negative Logits
пÑĥÑĤем
-0.17
ylie
-0.15
otland
-0.15
Binder
-0.15
sobie
-0.14
asc
-0.14
.Fat
-0.14
odont
-0.14
oko
-0.14
Cav
-0.14
POSITIVE LOGITS
/us
0.18
anst
0.18
rg
0.16
324
0.15
738
0.15
mun
0.14
940
0.14
ALA
0.14
oord
0.14
dao
0.14
Activations Density 0.133%