INDEX
Explanations
proper nouns or names, potentially of individuals
the letter 'W'
New Auto-Interp
Negative Logits
psy
-0.71
EVE
-0.66
consensual
-0.66
satisfied
-0.66
unaff
-0.65
fewer
-0.65
earth
-0.64
reputable
-0.63
conscientious
-0.62
scraping
-0.61
POSITIVE LOGITS
iesel
1.35
oj
1.29
irth
1.28
ahl
1.25
essel
1.25
asser
1.24
ohl
1.24
augh
1.21
ither
1.19
orthy
1.19
Activations Density 0.029%