INDEX
Explanations
instances of website URL structures
New Auto-Interp
Negative Logits
agas
-0.16
populist
-0.15
uide
-0.15
reed
-0.15
leigh
-0.14
edic
-0.14
ted
-0.14
popul
-0.14
é
-0.13
Seg
-0.13
POSITIVE LOGITS
-content
0.34
/wp
0.27
content
0.24
Content
0.21
wp
0.21
content
0.21
-json
0.20
(wp
0.20
CONTENT
0.19
CONTENT
0.18
Activations Density 0.005%