INDEX
Explanations
mentions of physical activity
instances of the word "exercise."
New Auto-Interp
Negative Logits
fixed
-0.86
oho
-0.76
gets
-0.73
lines
-0.73
ymes
-0.69
lining
-0.67
ener
-0.66
lined
-0.65
ocide
-0.64
alez
-0.63
POSITIVE LOGITS
ercise
0.93
exerc
0.88
exercise
0.79
issance
0.77
routines
0.76
Exercise
0.76
Pwr
0.75
icular
0.74
oleon
0.73
thur
0.72
Activations Density 0.015%