INDEX
Explanations
references to prominent literary figures and their works
New Auto-Interp
Negative Logits
oley
-0.16
oland
-0.16
วย
-0.15
jerne
-0.15
åħ
-0.14
_season
-0.14
iglia
-0.14
patri
-0.14
Season
-0.14
anas
-0.14
POSITIVE LOGITS
EDA
0.20
eda
0.20
Rowling
0.19
JK
0.18
dete
0.17
Dumbledore
0.17
hog
0.15
SOR
0.15
Warner
0.15
Potter
0.15
Activations Density 0.010%