INDEX
Explanations
descriptive adjectives followed by specific nouns
New Auto-Interp
Negative Logits
більш
0.35
upregulated
0.30
सतत
0.29
ہونا
0.29
ādi
0.29
情况
0.28
Một
0.28
altamente
0.28
訌
0.28
ایک
0.28
POSITIVE LOGITS
ly
0.36
dramas
0.35
orchids
0.33
smirk
0.33
grin
0.33
horrors
0.33
mutta
0.33
autumnal
0.33
5
0.33
but
0.33
Activations Density 0.077%