INDEX
Explanations
punctuation and sentence structure elements
New Auto-Interp
Negative Logits
ople
-0.17
lic
-0.16
acher
-0.14
02
-0.14
æīį
-0.14
ask
-0.14
ÅĦ
-0.14
ÅŁÄ±
-0.14
lear
-0.13
eger
-0.13
POSITIVE LOGITS
-article
0.18
едак
0.18
article
0.18
Article
0.18
öl
0.17
author
0.16
elong
0.16
writer
0.16
elden
0.16
articles
0.15
Activations Density 0.049%