INDEX
Explanations
instances of the word "first" as a prominent marker for introduction or emphasis in text
New Auto-Interp
Negative Logits
ukone
-0.91
méri
-0.83
LEncoder
-0.82
Scrolls
-0.79
aarrggbb
-0.78
Krieger
-0.77
okuyayım
-0.76
woordig
-0.75
Magdalene
-0.75
zeitung
-0.74
POSITIVE LOGITS
FIRST
1.34
First
1.34
FIRST
1.27
First
1.20
first
1.05
first
1.00
getFirst
0.92
first
0.90
findFirst
0.84
isFirst
0.82
Activations Density 0.110%