INDEX
Explanations
mentions of a specific name or entity related to a prominent individual or brand
New Auto-Interp
Negative Logits
ascus
-0.15
osi
-0.15
673
-0.15
814
-0.15
bet
-0.15
ration
-0.14
607
-0.14
eza
-0.14
Ñĭй
-0.14
yne
-0.14
POSITIVE LOGITS
ieri
0.27
bir
0.25
vier
0.20
unc
0.19
CHO
0.19
cho
0.18
ve
0.18
orex
0.18
awe
0.17
allo
0.17
Activations Density 0.012%