INDEX
Explanations
phrases and terms related to prohibitions or restrictions, particularly laws and bans
New Auto-Interp
Negative Logits
tÃŃ
-0.15
cul
-0.15
aux
-0.14
ests
-0.14
aggi
-0.14
finish
-0.14
enha
-0.14
ÛĮات
-0.14
Timing
-0.14
enko
-0.13
POSITIVE LOGITS
ishment
0.20
quet
0.16
sp
0.15
quets
0.14
LOUR
0.14
ushort
0.13
.fre
0.13
forall
0.13
лÑĸд
0.13
irl
0.13
Activations Density 0.049%