INDEX
Explanations
words related to rewards, numbers, and health conditions
numerical data or statistics related to losses or performance metrics
New Auto-Interp
Negative Logits
ĸļ
-0.96
eanor
-0.74
isman
-0.70
older
-0.69
forcer
-0.69
iece
-0.65
emark
-0.65
metic
-0.64
icum
-0.64
itamin
-0.64
POSITIVE LOGITS
ones
1.27
races
1.10
voices
1.06
lands
1.06
entities
1.03
ivities
1.03
falls
1.02
votes
1.02
trails
1.02
outputs
1.01
Activations Density 0.346%