INDEX
Explanations
mentions or instances of rules or restrictions being enforced or proposed
references to prohibitions or bans on various subjects
New Auto-Interp
Negative Logits
IMAGES
-0.68
prest
-0.65
rendition
-0.64
Maid
-0.64
Generations
-0.63
Turk
-0.63
Tuc
-0.61
Cous
-0.61
contrace
-0.60
Io
-0.60
POSITIVE LOGITS
ishment
1.43
hammer
1.15
ishing
1.13
eful
1.11
lie
0.99
jo
0.98
zai
0.95
tering
0.95
nered
0.94
ning
0.93
Activations Density 0.046%