INDEX
Explanations
phrases related to consistency and comparison
instances of the word "the" and its variations, indicating a focus on definite articles or references
New Auto-Interp
Negative Logits
caches
-0.67
vernment
-0.63
Versions
-0.60
DB
-0.60
briefly
-0.60
ells
-0.59
each
-0.59
periodically
-0.58
mares
-0.58
etheus
-0.58
POSITIVE LOGITS
same
1.45
same
1.27
Same
1.05
easiest
0.99
result
0.96
ologically
0.96
simplest
0.95
opposite
0.95
smallest
0.94
utmost
0.94
Activations Density 0.230%