INDEX
Explanations
terms related to negative emotions or displeasure
instances of the word "loathe" and its variations
New Auto-Interp
Negative Logits
rition
-0.80
glass
-0.78
pillar
-0.77
rity
-0.76
manship
-0.75
Norn
-0.74
hower
-0.73
sonian
-0.70
ITAL
-0.69
lished
-0.68
POSITIVE LOGITS
aned
1.05
aning
1.04
oser
1.02
veland
0.99
aves
0.99
vers
0.89
ishly
0.89
ppy
0.88
ven
0.88
igh
0.87
Activations Density 0.010%