INDEX
Explanations
mentions of awards or honors related to people or teams in sports
New Auto-Interp
Negative Logits
abble
-0.16
妮
-0.15
lings
-0.15
olin
-0.15
elle
-0.15
ponent
-0.14
obili
-0.14
labs
-0.14
ious
-0.14
urg
-0.14
POSITIVE LOGITS
endar
0.20
geme
0.20
ender
0.19
andro
0.17
vor
0.17
END
0.17
ahu
0.17
enda
0.16
iasi
0.16
Ù
0.15
Activations Density 0.053%