INDEX
Explanations
instances of the word "the" in various contexts
New Auto-Interp
Negative Logits
similarly
-0.18
Erotik
-0.18
podob
-0.17
simil
-0.17
аниÑĨ
-0.16
Similar
-0.15
Similar
-0.15
essler
-0.15
cision
-0.15
ngr
-0.15
POSITIVE LOGITS
sam
0.39
same
0.37
same
0.34
même
0.31
sam
0.31
samo
0.31
sami
0.30
Sam
0.28
sane
0.28
sa
0.27
Activations Density 0.044%