INDEX
Explanations
the presence of the phrase "the" in various contexts
New Auto-Interp
Negative Logits
similarly
-0.86
similar
-0.74
Similar
-0.74
similar
-0.74
Similarly
-0.72
Similarly
-0.71
Similar
-0.70
Like
-0.65
SIMILAR
-0.63
equally
-0.63
POSITIVE LOGITS
ſame
1.19
myſelf
1.10
itſelf
0.98
zelve
0.91
themſelves
0.89
sae
0.89
saine
0.86
samym
0.86
sane
0.86
Theſe
0.86
Activations Density 0.135%