INDEX
Explanations
phrases related to instructions or sequential processes
New Auto-Interp
Negative Logits
themſelves
-1.02
himſelf
-0.99
pleaſure
-0.93
myſelf
-0.90
Theſe
-0.88
BorderRadius
-0.85
poffe
-0.84
Jefus
-0.84
Monfieur
-0.81
leſs
-0.80
POSITIVE LOGITS
inorder
1.37
afin
1.21
order
1.12
inorder
1.08
Afin
1.04
为了
1.01
чтобы
0.95
כדי
0.92
為了
0.91
Afin
0.91
Activations Density 0.162%