INDEX
Explanations
directives or rules
expressions related to rules and guidelines in communication, particularly in social media contexts
New Auto-Interp
Negative Logits
convergence
-0.72
refurb
-0.68
pioneering
-0.64
upgr
-0.62
staggered
-0.61
estimated
-0.61
trak
-0.60
unparalleled
-0.60
millenn
-0.60
Baz
-0.59
POSITIVE LOGITS
anymore
1.07
inappropriately
0.97
nor
0.93
unnecessarily
0.85
iquette
0.85
disrespectful
0.84
.?
0.83
whatsoever
0.81
unless
0.80
slurs
0.79
Activations Density 0.915%