INDEX
Explanations
instances of the word "start"
occurrences of the word "start."
New Auto-Interp
Negative Logits
phy
-0.76
illard
-0.74
ugs
-0.65
iliary
-0.63
asca
-0.63
itsch
-0.62
obi
-0.60
entirety
-0.60
ocally
-0.59
otropic
-0.59
POSITIVE LOGITS
nings
1.21
ners
0.90
starting
0.83
rek
0.78
anew
0.76
UP
0.75
around
0.75
watch
0.74
ribune
0.72
points
0.70
Activations Density 0.056%