INDEX
Explanations
proper nouns, particularly names and titles
New Auto-Interp
Negative Logits
houses
-0.17
house
-0.16
ãĥ£
-0.16
Ïį
-0.16
تÙĪÙĨ
-0.16
hoe
-0.16
t
-0.15
tober
-0.15
tas
-0.15
tan
-0.15
POSITIVE LOGITS
cular
0.24
UARIO
0.21
ser
0.20
yne
0.19
cript
0.19
Maxim
0.18
aurus
0.18
663
0.18
son
0.18
zc
0.17
Activations Density 0.078%