INDEX
Explanations
proper nouns in phrases
the conjunction "and" in various contexts
New Auto-Interp
Negative Logits
artifacts
-0.79
wcs
-0.78
mining
-0.77
uggest
-0.72
minist
-0.72
system
-0.70
climate
-0.69
vernment
-0.69
imum
-0.69
pmwiki
-0.68
POSITIVE LOGITS
Sons
0.92
Morty
0.91
Ellie
0.89
Mary
0.86
Buster
0.85
Clyde
0.84
Paige
0.84
Donna
0.84
Kyl
0.84
Jerry
0.84
Activations Density 0.119%