INDEX
Explanations
URLs or website links in the document
New Auto-Interp
Negative Logits
plex
-0.15
lier
-0.15
ÅĻeb
-0.15
erk
-0.15
tle
-0.14
report
-0.14
yb
-0.14
ãĤ¹ãĤ¯
-0.14
ered
-0.14
edly
-0.14
POSITIVE LOGITS
-content
0.47
/wp
0.39
content
0.33
content
0.28
Content
0.28
_content
0.27
Content
0.27
.wp
0.26
_CONTENT
0.26
ontent
0.26
Activations Density 0.008%