INDEX
Explanations
pairings or comparisons between different entities
the word "both" and its contexts in various phrases
New Auto-Interp
Negative Logits
phase
-0.64
refriger
-0.63
template
-0.62
headquarters
-0.62
contra
-0.62
en
-0.61
signature
-0.61
-0.61
territory
-0.61
secret
-0.61
POSITIVE LOGITS
both
3.55
bottom
1.55
especially
1.50
particularly
1.48
those
1.43
either
1.40
also
1.36
many
1.33
these
1.32
mostly
1.23
Activations Density 0.020%