INDEX
Explanations
phrases related to legal actions or consequences
phrases indicating punishment or consequences related to actions
New Auto-Interp
Negative Logits
obyl
-0.74
venants
-0.72
ires
-0.68
士
-0.68
%%
-0.66
DragonMagazine
-0.66
atl
-0.65
eda
-0.63
ocity
-0.62
atar
-0.62
POSITIVE LOGITS
refusing
1.33
violating
1.30
daring
1.23
failing
1.20
breaching
1.17
exercising
1.12
interfering
1.11
gery
1.11
possessing
1.09
criticizing
1.09
Activations Density 0.133%