INDEX
Explanations
proper nouns, particularly names
New Auto-Interp
Negative Logits
baum
-0.18
rade
-0.16
ria
-0.15
bef
-0.15
ιαÏĤ
-0.15
ni
-0.15
senal
-0.14
gezocht
-0.14
ooks
-0.14
bart
-0.14
POSITIVE LOGITS
ise
0.29
vre
0.22
isa
0.20
nger
0.20
ie
0.19
Lou
0.19
verture
0.18
igi
0.18
loud
0.18
ette
0.17
Activations Density 0.006%