INDEX
Explanations
dates in textual format
the end of a text or document
New Auto-Interp
Negative Logits
EVERY
-0.88
goddamn
-0.86
darn
-0.83
damn
-0.82
damned
-0.77
fucking
-0.76
#$
-0.75
infinitely
-0.75
dudes
-0.74
ALWAYS
-0.73
POSITIVE LOGITS
resa
1.04
nsic
0.93
iday
0.90
odore
0.89
stanbul
0.86
respond
0.84
spokeswoman
0.83
intendent
0.80
anmar
0.79
swers
0.79
Activations Density 0.356%