INDEX
Explanations
phrases indicating the purpose or function of actions
New Auto-Interp
Negative Logits
Jefus
-1.63
pleaſure
-1.63
Monfieur
-1.51
myſelf
-1.49
themſelves
-1.49
Diſ
-1.48
houſe
-1.48
faſt
-1.44
Efq
-1.44
itſelf
-1.41
POSITIVE LOGITS
afin
0.85
inorder
0.82
to
0.72
order
0.67
To
0.62
Cio
0.60
أجل
0.59
inder
0.59
inorder
0.59
Afin
0.59
Activations Density 0.056%