INDEX
Explanations
words or phrases related to being banned or barred from certain actions or places
terms related to restrictions or prohibitions
New Auto-Interp
Negative Logits
worn
-0.81
Engineers
-0.73
imon
-0.68
cule
-0.66
hidden
-0.66
framework
-0.65
disappoint
-0.65
dimension
-0.65
ensional
-0.63
});
-0.63
POSITIVE LOGITS
indefinitely
0.88
altogether
0.85
atever
0.82
interstate
0.76
premises
0.72
activities
0.71
accessing
0.70
participating
0.70
flights
0.69
offending
0.68
Activations Density 0.477%