INDEX
Explanations
structured phrases indicating a recurring event or action
the phrase "Every 1" or similar repeated structures
New Auto-Interp
Negative Logits
dict
-0.73
hess
-0.67
olas
-0.64
Pru
-0.60
illions
-0.59
Jah
-0.59
ainment
-0.58
Metatron
-0.55
pron
-0.55
stripes
-0.54
POSITIVE LOGITS
THING
1.42
where
1.28
conceivable
1.04
body
1.02
single
0.97
day
0.95
WHERE
0.91
Ĥª
0.88
single
0.87
things
0.86
Activations Density 0.039%