INDEX
Explanations
interesting information or content
occurrences of the word "interesting"
New Auto-Interp
Negative Logits
oise
-0.78
helle
-0.77
heed
-0.74
redited
-0.71
eded
-0.71
required
-0.70
uts
-0.70
otent
-0.69
aper
-0.69
reditation
-0.69
POSITIVE LOGITS
tid
0.89
Flavoring
0.87
twists
0.84
arios
0.83
sidel
0.78
insights
0.77
trivia
0.75
anecdotes
0.71
observations
0.70
curiosity
0.69
Activations Density 0.037%