INDEX
Explanations
phrases indicating abundance or quantity
New Auto-Interp
Negative Logits
ocker
-0.19
imest
-0.15
axter
-0.15
earch
-0.15
ooke
-0.14
cket
-0.14
erece
-0.14
oner
-0.14
emez
-0.14
ooth
-0.14
POSITIVE LOGITS
a
0.30
'o
0.24
more
0.23
-o
0.23
'a
0.20
happening
0.20
else
0.19
’a
0.18
af
0.18
aN
0.18
Activations Density 0.013%