INDEX
Explanations
the presence of specific personal names or identifiers
New Auto-Interp
Negative Logits
Princ
-0.96
OM
-0.87
ple
-0.80
indo
-0.79
Um
-0.75
obser
-0.73
Pis
-0.73
Som
-0.72
Marian
-0.72
Ser
-0.71
POSITIVE LOGITS
ck
1.25
cks
1.25
ock
1.14
ack
1.12
ocker
1.04
acks
1.02
ocks
1.02
rack
0.98
ucket
0.98
icken
0.98
Activations Density 0.068%