INDEX
Explanations
actions or processes related to decision-making and involvement in various tasks
New Auto-Interp
Negative Logits
Carthy
-0.15
brat
-0.15
ضÛĮ
-0.15
ضÙĬ
-0.14
projection
-0.14
Pres
-0.14
véd
-0.14
uil
-0.14
askell
-0.14
Projection
-0.13
POSITIVE LOGITS
inp
0.16
344
0.15
idle
0.15
ebp
0.14
636
0.14
ãĥ³ãĥij
0.14
jur
0.14
ÙĨج
0.14
fds
0.14
167
0.14
Activations Density 0.025%