INDEX
Explanations
proper nouns of individuals, particularly those named Adam
New Auto-Interp
Negative Logits
ice
-0.17
-o
-0.15
infl
-0.15
prive
-0.14
iang
-0.14
rv
-0.13
ãģĤãģĤ
-0.13
er
-0.13
486
-0.13
eso
-0.13
POSITIVE LOGITS
adam
0.19
onas
0.18
odash
0.17
.Adam
0.16
Adam
0.16
ventus
0.16
adam
0.16
дейÑģÑĤв
0.15
ad
0.15
adro
0.15
Activations Density 0.030%