INDEX
Explanations
names or references related to a specific person, possibly associated with legal or political contexts
New Auto-Interp
Negative Logits
ORIG
-0.75
Lauder
-0.74
drift
-0.65
Atlantic
-0.63
ashtra
-0.63
Clash
-0.62
agher
-0.61
Irma
-0.61
Surviv
-0.60
Devi
-0.60
POSITIVE LOGITS
lishing
1.29
bing
1.23
rious
1.19
lique
1.16
bles
1.15
lish
1.13
lisher
1.13
ilant
1.08
bed
1.08
bish
1.05
Activations Density 0.024%