INDEX
Explanations
words and phrases associated with physical action and their consequences
New Auto-Interp
Negative Logits
aup
-0.15
-lfs
-0.14
kud
-0.13
Ups
-0.13
/up
-0.13
/down
-0.13
رسÛĮ
-0.13
aub
-0.13
ufs
-0.12
اÙģØª
-0.12
POSITIVE LOGITS
out
1.55
out
1.02
-out
1.01
åĩº
0.94
Out
0.90
(out
0.84
_out
0.83
Out
0.81
out
0.80
OUT
0.79
Activations Density 1.323%