INDEX
Explanations
specific mentions or occurrences of the word "first" followed by a number
New Auto-Interp
Negative Logits
tics
-0.75
uba
-0.71
hawks
-0.70
iths
-0.67
ans
-0.64
ernels
-0.63
oops
-0.62
iversity
-0.61
iences
-0.60
today
-0.60
POSITIVE LOGITS
layer
1.13
iteration
1.13
paragraph
1.10
step
1.08
element
1.07
section
1.05
subparagraph
1.03
tier
1.01
most
1.00
half
0.99
Activations Density 0.137%