INDEX
Explanations
instances of the word "button"
New Auto-Interp
Negative Logits
Flavoring
-0.80
ctuary
-0.78
Evening
-0.74
ETH
-0.70
ews
-0.68
Seraph
-0.68
ILY
-0.68
abama
-0.68
vironment
-0.67
Atmosp
-0.66
POSITIVE LOGITS
button
0.97
bell
0.95
holes
0.90
hole
0.90
oola
0.90
pus
0.87
button
0.87
buttons
0.82
header
0.78
clicked
0.78
Activations Density 0.019%