INDEX
Explanations
sections of text related to news and entertainment categories
New Auto-Interp
Negative Logits
zan
-0.16
opsis
-0.15
zw
-0.14
ella
-0.14
ëŁŃ
-0.13
uhan
-0.13
):?>↵
-0.13
immer
-0.13
IFA
-0.13
jev
-0.13
POSITIVE LOGITS
anta
0.15
tek
0.15
anya
0.14
Rp
0.14
iel
0.14
ng
0.14
abr
0.14
‘
0.14
Vic
0.14
why
0.13
Activations Density 0.013%