INDEX
Explanations
phrases related to beginnings or initiating actions
New Auto-Interp
Negative Logits
ths
-0.16
upon
-0.15
enne
-0.15
oice
-0.14
-strokes
-0.14
DAQ
-0.14
earch
-0.14
ейÑģÑĤв
-0.14
.Link
-0.14
Bris
-0.14
POSITIVE LOGITS
start
0.28
boxing
0.27
kicked
0.25
-start
0.24
starter
0.24
Kick
0.24
started
0.24
kick
0.23
kick
0.22
Kick
0.22
Activations Density 0.010%