INDEX
Explanations
interesting or engaging information
mentions of the word "interesting."
New Auto-Interp
Negative Logits
oise
-0.77
uts
-0.73
eded
-0.72
reditation
-0.71
helle
-0.71
xia
-0.70
arest
-0.70
heed
-0.67
chen
-0.66
aping
-0.66
POSITIVE LOGITS
Flavoring
0.88
tid
0.84
lihood
0.77
Magikarp
0.77
trivia
0.75
twists
0.74
sidel
0.74
surprises
0.72
shade
0.71
curiosity
0.71
Activations Density 0.026%