INDEX
Explanations
proper nouns related to significant individuals or entities
New Auto-Interp
Negative Logits
elle
-0.21
eh
-0.21
ess
-0.21
ex
-0.20
essa
-0.19
ine
-0.18
alls
-0.17
esser
-0.17
ev
-0.17
els
-0.17
POSITIVE LOGITS
boro
0.19
orraine
0.19
abeled
0.19
alu
0.18
isle
0.17
toi
0.17
uster
0.17
homme
0.17
ÃŃky
0.17
ighth
0.17
Activations Density 0.067%