INDEX
Explanations
the word "nice" with varying strengths of activation
instances of the word "nice" and its variations
New Auto-Interp
Negative Logits
ogens
-0.87
arians
-0.79
aer
-0.77
yrinth
-0.76
uality
-0.75
ivals
-0.74
inant
-0.73
WIND
-0.70
arian
-0.70
omics
-0.70
POSITIVE LOGITS
gesture
0.82
ño
0.82
little
0.78
breeze
0.77
bye
0.76
tits
0.76
nice
0.76
touches
0.76
additions
0.73
bye
0.73
Activations Density 0.030%