INDEX
Explanations
recently mentioned or previously mentioned items
references to time indicators related to recent or previously occurred events
New Auto-Interp
Negative Logits
atism
-0.84
afety
-0.69
ILY
-0.68
ibility
-0.67
Cohn
-0.67
DRAG
-0.63
ility
-0.61
olor
-0.60
letico
-0.60
Integrity
-0.60
POSITIVE LOGITS
existing
1.14
mentioned
1.09
deceased
1.02
unseen
1.02
discussed
0.96
unsus
0.96
mentioned
0.95
released
0.94
departed
0.92
occurring
0.91
Activations Density 0.125%