INDEX
Explanations
adjectives with strong emotional connotations or ethical significance
words and phrases related to ethical, offensive, and revolutionary concepts
New Auto-Interp
Negative Logits
orah
-0.74
udeb
-0.70
region
-0.69
onde
-0.67
onds
-0.66
essage
-0.66
adobe
-0.64
ournal
-0.64
ogie
-0.64
secution
-0.63
POSITIVE LOGITS
enough
1.12
insofar
0.79
fodder
0.78
nonetheless
0.77
Enough
0.76
isable
0.75
deterrent
0.74
istically
0.74
enough
0.72
undermin
0.72
Activations Density 0.469%