INDEX
Explanations
time-related words and numbers
timestamps and numerical values
New Auto-Interp
Negative Logits
udeb
-0.86
carbohyd
-0.79
iannopoulos
-0.74
eatures
-0.71
eleph
-0.71
necess
-0.70
akin
-0.68
tremend
-0.68
iliated
-0.66
unal
-0.64
POSITIVE LOGITS
Where
0.94
Others
0.91
WRITE
0.90
Converted
0.89
Submit
0.87
=================================================================
0.82
Miscellaneous
0.82
Percentage
0.81
DISTR
0.81
Classification
0.80
Activations Density 0.086%