INDEX
Explanations
dates in a specific format
dates and numerical references associated with events
New Auto-Interp
Negative Logits
avorite
-0.72
orable
-0.70
secretaries
-0.67
ModLoader
-0.64
netflix
-0.63
imei
-0.62
careless
-0.62
habitual
-0.60
chwitz
-0.59
cific
-0.59
POSITIVE LOGITS
th
0.92
Interstitial
0.81
WH
0.76
05
0.75
Feb
0.73
09
0.72
0002
0.72
TH
0.72
07
0.71
04
0.70
Activations Density 0.052%