INDEX
Explanations
instances of the word "down" in various contexts
New Auto-Interp
Negative Logits
Mits
-0.18
tle
-0.17
ipar
-0.17
orarily
-0.16
uer
-0.16
d
-0.16
to
-0.15
dz
-0.15
orate
-0.15
Runner
-0.14
POSITIVE LOGITS
pour
0.25
patrick
0.24
ey
0.24
graded
0.23
grading
0.22
playing
0.22
syndrome
0.21
grades
0.21
Syndrome
0.21
shift
0.21
Activations Density 0.019%