INDEX
Explanations
dates and time-related information
numeric values, particularly those related to times and rankings
New Auto-Interp
Negative Logits
cart
-0.66
fencing
-0.64
scenery
-0.63
pastoral
-0.63
precip
-0.62
bleeding
-0.62
creeping
-0.61
separat
-0.61
vel
-0.61
plateau
-0.60
POSITIVE LOGITS
120
0.90
148
0.88
118
0.84
145
0.83
114
0.83
â̲
0.83
sqor
0.82
203
0.82
147
0.82
802
0.82
Activations Density 0.156%