INDEX
Explanations
instances of drawing or depicting differences or outlines between various entities or concepts
phrases that involve drawing actions or comparisons
New Auto-Interp
Negative Logits
Ü
-0.82
hops
-0.67
ntil
-0.66
staking
-0.65
Boost
-0.65
TED
-0.64
ruciating
-0.63
pher
-0.63
speak
-0.63
Chance
-0.60
POSITIVE LOGITS
conclusions
1.31
parallels
1.08
conclusion
1.08
curtains
1.05
ire
1.05
inference
1.04
attention
0.90
distinctions
0.90
line
0.90
lines
0.87
Activations Density 0.058%