INDEX
Explanations
years or numerical indicators related to time
references to specific years
New Auto-Interp
Negative Logits
showc
-0.71
ktop
-0.68
cumbers
-0.65
NetMessage
-0.64
horr
-0.62
contag
-0.62
choke
-0.62
ecast
-0.61
bluff
-0.61
footh
-0.60
POSITIVE LOGITS
long
1.16
nings
1.12
book
1.12
ning
1.10
lings
1.02
books
0.95
lies
0.91
olds
0.90
ns
0.89
Ago
0.86
Activations Density 0.109%