INDEX
Explanations
adjectives related to the evaluation of concepts or things
phrases related to opinions or assessments of societal and cultural topics
New Auto-Interp
Negative Logits
igent
-0.77
conclud
-0.74
steps
-0.70
ktop
-0.70
Materials
-0.69
underscores
-0.69
reiterate
-0.68
cellaneous
-0.68
edIn
-0.68
»Ĵ
-0.67
POSITIVE LOGITS
ruining
1.16
gonna
1.10
racist
1.09
sexist
1.04
extinct
1.04
crap
1.04
nt
1.02
unbeat
1.02
BAD
1.00
horrible
1.00
Activations Density 0.406%