INDEX
Explanations
positive adjectives like "awesome" with moderately high activation values
instances of the word "awesome" and its variations
New Auto-Interp
Negative Logits
mediate
-0.70
Clar
-0.68
idation
-0.66
reditation
-0.65
Downloadha
-0.64
apers
-0.63
avis
-0.63
Inquiry
-0.63
licts
-0.63
ASE
-0.63
POSITIVE LOGITS
ly
0.98
ery
0.97
ness
0.93
stuff
0.82
NESS
0.80
stuff
0.77
artwork
0.77
nels
0.76
GIF
0.76
sounding
0.75
Activations Density 0.075%