INDEX
Explanations
instances of events or items making their first appearance or being introduced
instances of the word "debut" and related forms
New Auto-Interp
Negative Logits
fax
-0.68
enough
-0.68
Downloadha
-0.65
GO
-0.64
hate
-0.63
learn
-0.63
gor
-0.60
Boh
-0.58
average
-0.57
Rutherford
-0.57
POSITIVE LOGITS
antes
1.43
ante
1.30
ants
1.09
ant
1.03
antly
0.95
ary
0.85
episode
0.82
album
0.78
ees
0.77
ing
0.76
Activations Density 0.057%