INDEX
Explanations
sentences containing measurements or technical specifications
the presence of punctuation marks, particularly periods
New Auto-Interp
Negative Logits
foolish
-0.89
slowing
-0.81
questioning
-0.80
swat
-0.76
rallying
-0.75
painfully
-0.75
tricked
-0.74
sooner
-0.74
stubborn
-0.74
tame
-0.73
POSITIVE LOGITS
Lastly
1.71
Finally
1.46
Additionally
1.35
<|endoftext|>
1.22
Overall
1.20
Specifications
1.16
Additional
1.16
Each
1.15
Lastly
1.14
Throughout
1.13
Activations Density 0.302%