INDEX
Explanations
phrases related to disciplinary actions or conflicts
actions involving failure to comply with rules or regulations
New Auto-Interp
Negative Logits
endif
-0.77
ciating
-0.73
depending
-0.71
inger
-0.68
fortunately
-0.68
gradation
-0.67
erm
-0.66
cking
-0.66
assuming
-0.65
atari
-0.65
POSITIVE LOGITS
sake
0.87
illegal
0.79
improper
0.76
purposes
0.73
nonviolent
0.64
unlawful
0.63
insensitive
0.63
tein
0.62
withd
0.62
unsafe
0.62
Activations Density 0.162%