INDEX
Explanations
the occurrence of proper nouns, specifically those starting with "Ch."
New Auto-Interp
Negative Logits
plat
-0.15
átor
-0.15
618
-0.15
aria
-0.14
dep
-0.14
arts
-0.14
Nib
-0.14
plat
-0.14
067
-0.14
rom
-0.14
POSITIVE LOGITS
atham
0.22
umph
0.18
apult
0.17
umont
0.17
âte
0.16
idding
0.16
omor
0.16
iang
0.16
Scalars
0.15
appa
0.15
Activations Density 0.015%