INDEX
Explanations
names or proper nouns ending in 'ly'
adverbs ending in 'ly'
New Auto-Interp
Negative Logits
ilater
-0.91
aciously
-0.84
ifully
-0.82
ilogy
-0.75
artifacts
-0.72
itive
-0.70
irlf
-0.70
ilaterally
-0.69
indo
-0.69
arsity
-0.69
POSITIVE LOGITS
tics
1.13
rics
1.05
phant
0.94
mph
0.92
rical
0.87
sis
0.87
ndra
0.84
ffe
0.82
lene
0.82
nda
0.81
Activations Density 0.038%