INDEX
Explanations
the word "awesome" with various intensities
expressions of excitement or positivity
New Auto-Interp
Negative Logits
avis
-0.74
arenthood
-0.70
Breed
-0.70
AUT
-0.68
Downloadha
-0.68
PT
-0.68
Clin
-0.67
eper
-0.66
unker
-0.66
arians
-0.65
POSITIVE LOGITS
awesome
0.92
Awesome
0.85
NESS
0.84
fun
0.83
ly
0.82
stuff
0.79
thing
0.77
teamwork
0.76
synergy
0.74
idea
0.72
Activations Density 0.019%