INDEX
Explanations
instances of the word "start" or its variations
New Auto-Interp
Negative Logits
Mant
-0.15
IEL
-0.14
Into
-0.14
into
-0.14
wer
-0.14
ater
-0.14
into
-0.14
ysl
-0.14
com
-0.13
correspond
-0.13
POSITIVE LOGITS
innoc
0.24
innocent
0.20
humble
0.19
hum
0.19
simply
0.17
small
0.17
nings
0.17
-simple
0.17
simples
0.17
Hum
0.16
Activations Density 0.041%