INDEX
Explanations
phrases related to legal punishments or sentences
phrases indicating legal penalties or sentences
New Auto-Interp
Negative Logits
illin
-0.91
soType
-0.83
ern
-0.80
edin
-0.78
DragonMagazine
-0.75
Craft
-0.75
atl
-0.74
LAB
-0.74
unny
-0.73
bleacher
-0.72
POSITIVE LOGITS
violating
0.97
gery
0.96
breaching
0.93
questioning
0.91
geries
0.89
aging
0.87
committing
0.84
daring
0.83
accessing
0.83
bidden
0.82
Activations Density 0.144%