INDEX
Explanations
references to physical force or power
references to "force" in various contexts
New Auto-Interp
Negative Logits
algia
-0.81
alam
-0.75
vironment
-0.73
ecause
-0.71
STER
-0.71
roma
-0.70
Hop
-0.69
artment
-0.69
andon
-0.69
DERR
-0.68
POSITIVE LOGITS
maj
1.29
Awakens
0.99
exerted
0.99
multiplier
0.93
fulness
0.87
force
0.84
vic
0.84
ps
0.79
induction
0.75
multipl
0.75
Activations Density 0.044%