INDEX
Explanations
dates and time periods
phrases indicating time that reference recent events or statistics
New Auto-Interp
Negative Logits
bart
-0.84
abus
-0.75
anie
-0.74
potion
-0.73
bear
-0.71
BILITIES
-0.70
aves
-0.68
pta
-0.66
isons
-0.66
obic
-0.64
POSITIVE LOGITS
inception
1.14
rely
1.14
2009
0.94
1978
0.93
2005
0.93
2006
0.92
1950
0.92
1998
0.92
1999
0.92
2010
0.92
Activations Density 0.050%