INDEX
Explanations
dates of events
dates and their associated significance
New Auto-Interp
Negative Logits
hobbies
-0.64
igham
-0.61
departments
-0.60
orable
-0.59
loves
-0.58
longstanding
-0.56
persecuted
-0.56
dwell
-0.56
icable
-0.55
GGGGGGGG
-0.55
POSITIVE LOGITS
th
1.34
rd
0.95
ths
0.91
TH
0.83
teenth
0.82
eteenth
0.73
thus
0.69
ember
0.68
nd
0.68
ools
0.67
Activations Density 0.086%