INDEX
Explanations
locations or activities related to physical exercise
references to gyms and fitness-related activities
New Auto-Interp
Negative Logits
Strait
-0.82
drawn
-0.68
theless
-0.66
afar
-0.65
ICT
-0.64
better
-0.64
Boe
-0.63
IBLE
-0.62
flawed
-0.62
angered
-0.62
POSITIVE LOGITS
nas
1.60
rats
0.89
rosse
0.87
bell
0.85
cers
0.84
rition
0.82
nos
0.82
mers
0.79
lapt
0.78
gym
0.77
Activations Density 0.011%