INDEX
Explanations
references to something that is not new
references to the concept of "newness."
New Auto-Interp
Negative Logits
ignt
-0.76
++++
-0.73
PRESS
-0.69
teasp
-0.65
Bastard
-0.64
everal
-0.63
sqor
-0.62
++++++++
-0.59
verning
-0.58
Brach
-0.57
POSITIVE LOGITS
bie
1.04
bies
0.82
yk
0.75
phenomenon
0.75
hart
0.73
Zealand
0.72
owan
0.71
actionDate
0.71
aeus
0.71
Reilly
0.71
Activations Density 0.075%