INDEX
Explanations
words related to social issues and policy, particularly with a critical or contentious tone
negative or critical descriptors associated with various subjects
New Auto-Interp
Negative Logits
iage
-0.72
urtles
-0.71
ggies
-0.67
Horizons
-0.66
ateurs
-0.65
eeks
-0.65
iets
-0.64
regor
-0.64
ynthesis
-0.63
Jackets
-0.62
POSITIVE LOGITS
-)
0.90
-
0.90
]-
0.83
istic
0.82
functional
0.82
)-
0.81
'-
0.80
ocratic
0.79
-[
0.77
"-
0.75
Activations Density 0.330%