INDEX
Explanations
proper nouns
the word "The" as it occurs repeatedly throughout the context
New Auto-Interp
Negative Logits
swe
-0.68
wound
-0.67
equivalent
-0.62
unpaid
-0.61
luck
-0.60
impression
-0.60
shopping
-0.59
pal
-0.59
,
-0.59
blocker
-0.58
POSITIVE LOGITS
The
2.23
This
1.67
There
1.66
When
1.60
These
1.59
While
1.58
According
1.56
It
1.55
Those
1.54
During
1.54
Activations Density 0.211%