INDEX
Explanations
positive or exciting things described as "awesome" with a strong activation value
expressions of strong enthusiasm or admiration
New Auto-Interp
Negative Logits
licts
-0.71
mediate
-0.70
uting
-0.68
avis
-0.67
arenthood
-0.67
vere
-0.66
epad
-0.65
epend
-0.64
unker
-0.64
utions
-0.64
POSITIVE LOGITS
ly
0.99
NESS
0.88
Awesome
0.79
fun
0.77
awesome
0.77
teamwork
0.76
stuff
0.74
ness
0.73
sounding
0.72
ery
0.72
Activations Density 0.027%