INDEX
Explanations
references to literary culture and famous authors
New Auto-Interp
Negative Logits
reh
-0.16
tember
-0.15
imuth
-0.15
Orleans
-0.15
CF
-0.15
fang
-0.14
ves
-0.14
Jenner
-0.14
enschaft
-0.14
Barbar
-0.14
POSITIVE LOGITS
Joyce
0.40
Dublin
0.39
Joy
0.30
Bloom
0.29
Irish
0.27
Dub
0.26
Ireland
0.26
Blo
0.25
Stephen
0.24
Blo
0.24
Activations Density 0.005%