INDEX
Explanations
references to TV shows or movies within a specific year
the presence of the word "in."
New Auto-Interp
Negative Logits
ĸļ
-0.80
ĨĴ
-0.78
Jr
-0.71
CLASSIFIED
-0.65
Entered
-0.65
xual
-0.65
Revolution
-0.64
âĵĺ
-0.64
defe
-0.63
Marginal
-0.63
POSITIVE LOGITS
essler
0.75
esa
0.71
retch
0.70
POP
0.68
ottest
0.67
kell
0.66
Osh
0.65
Chel
0.65
Tsukuyomi
0.65
Pon
0.63
Activations Density 0.000%