INDEX
Explanations
phrases related to the act of closing or silencing
New Auto-Interp
Negative Logits
heet
-0.17
gent
-0.16
heets
-0.16
862
-0.16
ont
-0.16
iced
-0.15
/=
-0.15
hoot
-0.15
aggio
-0.14
idis
-0.14
POSITIVE LOGITS
ters
0.40
shut
0.28
ting
0.28
tings
0.25
down
0.24
TING
0.24
-down
0.22
tdown
0.22
TERS
0.21
shuts
0.20
Activations Density 0.008%