INDEX
Explanations
dates and temporal references
references to specific time markers or temporal phrases
New Auto-Interp
Negative Logits
artifacts
-0.66
honored
-0.62
Nanto
-0.61
jri
-0.61
ARI
-0.61
prosec
-0.60
prosecuted
-0.60
hesitate
-0.59
guided
-0.59
imir
-0.58
POSITIVE LOGITS
standards
0.75
stocks
0.72
rox
0.67
century
0.67
exhaustion
0.66
End
0.65
Closure
0.65
ä¸ī
0.65
thirds
0.64
end
0.62
Activations Density 0.085%