INDEX
Explanations
names starting with "els" and are followed by a combination of letters
occurrences of the substring "els" within words
New Auto-Interp
Negative Logits
Occupations
-0.69
ALLY
-0.66
citiz
-0.65
Seym
-0.64
spe
-0.64
Augusta
-0.62
acity
-0.62
dividend
-0.61
scribed
-0.61
ythm
-0.60
POSITIVE LOGITS
inki
1.10
warm
1.00
bach
0.99
hof
0.97
pace
0.91
ong
0.89
ounge
0.86
kamp
0.86
ength
0.84
zeb
0.82
Activations Density 0.040%