INDEX
Explanations
mentions of the word "box" and its variations
New Auto-Interp
Negative Logits
opsy
-0.17
ustrial
-0.15
ufen
-0.15
ufs
-0.15
opia
-0.15
urre
-0.14
reich
-0.14
icense
-0.14
auge
-0.14
пÑĢав
-0.14
POSITIVE LOGITS
(es
0.34
<dyn
0.25
-sizing
0.24
ercise
0.22
tures
0.22
-shadow
0.20
(Box
0.18
idine
0.18
plorer
0.18
erif
0.18
Activations Density 0.029%