INDEX
Explanations
email addresses with a specific protection level
references to protected content or concepts related to intellectual property rights
New Auto-Interp
Negative Logits
bender
-0.79
hur
-0.79
elia
-0.69
nexus
-0.67
ellen
-0.66
COL
-0.66
EEK
-0.65
lore
-0.63
bang
-0.63
wow
-0.63
POSITIVE LOGITS
protected
0.77
agonist
0.76
anonymity
0.75
folios
0.71
imedia
0.71
igion
0.70
igious
0.69
arians
0.69
ected
0.68
eas
0.67
Activations Density 0.022%