INDEX
Explanations
references to physical exercise locations, specifically the gym
references to a gym
New Auto-Interp
Negative Logits
drawn
-0.90
angered
-0.73
Strait
-0.71
better
-0.70
âĢ¢âĢ¢
-0.69
flawed
-0.65
Citizens
-0.64
OPLE
-0.63
theless
-0.62
UID
-0.60
POSITIVE LOGITS
nas
1.35
gym
0.97
trainer
0.87
rats
0.83
Gym
0.83
lapt
0.76
trainers
0.75
corrid
0.74
rition
0.74
ido
0.73
Activations Density 0.005%