INDEX
Explanations
actions related to rules or authority
actions that involve making mistakes or facing consequences
New Auto-Interp
Negative Logits
ilogy
-0.73
inger
-0.70
DragonMagazine
-0.70
erm
-0.70
Depending
-0.66
ciating
-0.65
endif
-0.65
mort
-0.64
fortunately
-0.64
alg
-0.64
POSITIVE LOGITS
improper
0.80
illegal
0.76
BuyableInstoreAndOnline
0.71
ãĤ¤ãĥĪ
0.70
illegally
0.66
violations
0.65
unsafe
0.65
insufficient
0.64
excessively
0.63
improperly
0.62
Activations Density 0.187%