INDEX
Explanations
words with negative connotations or describing negative characteristics/actions
instances of the word "bad" in various contexts
New Auto-Interp
Negative Logits
ĸļ
-0.91
raltar
-0.84
ensional
-0.79
ittees
-0.78
conservancy
-0.77
aukee
-0.77
eters
-0.76
olate
-0.76
ynthesis
-0.74
illation
-0.74
POSITIVE LOGITS
dest
1.08
dies
1.04
die
0.98
gered
0.92
karma
0.91
ger
0.89
luck
0.86
manners
0.80
publicity
0.78
ged
0.78
Activations Density 0.025%