INDEX
Explanations
references to specific individuals or notable figures
New Auto-Interp
Negative Logits
amac
-0.15
θο
-0.14
Bureau
-0.14
oker
-0.14
acular
-0.13
à«
-0.13
ncpy
-0.13
Ej
-0.13
owie
-0.13
Taj
-0.13
POSITIVE LOGITS
esch
0.33
eme
0.31
leich
0.31
ew
0.31
lied
0.28
eden
0.28
eb
0.27
egen
0.26
ottes
0.26
es
0.25
Activations Density 0.011%