INDEX
Explanations
prefixes used in written text, such as "The" before specific nouns
occurrences of the word "The"
New Auto-Interp
Negative Logits
reserve
-0.70
wound
-0.68
rank
-0.68
luck
-0.66
arch
-0.64
drop
-0.63
assigned
-0.63
care
-0.63
favor
-0.63
equivalent
-0.62
POSITIVE LOGITS
The
2.30
There
1.72
ccording
1.72
THE
1.64
This
1.64
When
1.61
Both
1.58
It
1.58
While
1.56
Our
1.56
Activations Density 0.231%