INDEX
Explanations
prohibits or restricts actions
New Auto-Interp
Negative Logits
elucidation
0.36
ritor
0.36
tych
0.32
zejména
0.30
vem
0.30
momentous
0.29
των
0.29
suave
0.29
scoperta
0.29
fruitful
0.29
POSITIVE LOGITS
violate
0.32
illegally
0.31
нару
0.31
unfairly
0.30
包含
0.30
beitet
0.30
violates
0.29
prohibits
0.29
напрямую
0.29
限制
0.29
Activations Density 0.000%