INDEX
Explanations
numeric values mixed with special characters and symbols
the occurrence of the year "2007" and symbols indicating fractions
New Auto-Interp
Negative Logits
wagen
-0.82
andom
-0.77
APH
-0.75
¿½
-0.74
hyde
-0.73
axy
-0.70
ument
-0.70
ATES
-0.68
xtap
-0.66
phas
-0.66
POSITIVE LOGITS
lers
0.82
lishes
0.80
esville
0.79
iership
0.77
herry
0.77
elsius
0.77
lished
0.76
iano
0.75
icut
0.73
rir
0.72
Activations Density 0.096%