INDEX
Explanations
references to legal infractions or misconduct
terms related to violations of rules or laws
New Auto-Interp
Negative Logits
vy
-0.67
essen
-0.64
cies
-0.63
liam
-0.62
alky
-0.61
zi
-0.61
soDeliveryDate
-0.61
hatched
-0.60
anyahu
-0.60
çļ
-0.59
POSITIVE LOGITS
raction
1.24
ractions
1.04
Citation
0.75
fixme
0.73
ij士
0.73
DragonMagazine
0.72
terness
0.72
Extras
0.72
somew
0.70
rection
0.69
Activations Density 0.007%