INDEX
Explanations
words related to physical processes or transformations
New Auto-Interp
Negative Logits
adolu
-0.17
/on
-0.15
/up
-0.14
/from
-0.14
ad
-0.14
iger
-0.14
éĢŁ
-0.14
пÑĸÑĪ
-0.13
éĢĶ
-0.13
оÑĢон
-0.13
POSITIVE LOGITS
out
0.21
-out
0.21
-up
0.20
up
0.16
-off
0.16
åĩºæĿ¥
0.16
LEAN
0.15
-down
0.15
off
0.14
êµ´
0.14
Activations Density 0.324%