INDEX
Explanations
words related to societal issues such as racism, authority, and power dynamics
terms associated with race and social commentary
New Auto-Interp
Negative Logits
guiActiveUnfocused
-0.78
Yose
-0.71
Enlarge
-0.63
Pryor
-0.62
Rack
-0.59
Grimes
-0.58
Genetics
-0.57
Rare
-0.56
Reported
-0.56
Reck
-0.55
POSITIVE LOGITS
ifies
0.86
heartedly
0.84
izes
0.83
tarians
0.82
bably
0.80
iates
0.80
sucks
0.79
gered
0.79
handedly
0.77
fully
0.77
Activations Density 0.633%