INDEX
Explanations
various URLs and related alphanumeric patterns
the presence of web URLs or references to online content
New Auto-Interp
Negative Logits
t
-0.85
dor
-0.85
lining
-0.76
sburg
-0.74
wagon
-0.74
met
-0.73
rat
-0.73
tein
-0.72
lus
-0.71
mus
-0.71
POSITIVE LOGITS
ecd
1.16
usterity
1.04
uthor
1.04
cean
1.01
pport
0.95
xt
0.95
velength
0.94
vern
0.94
hei
0.88
qi
0.87
Activations Density 0.114%