INDEX
    Explanations

    phrases indicating the purpose or function of actions

    New Auto-Interp
    Negative Logits
     Jefus
    -1.63
     pleaſure
    -1.63
     Monfieur
    -1.51
     myſelf
    -1.49
     themſelves
    -1.49
     Diſ
    -1.48
     houſe
    -1.48
     faſt
    -1.44
     Efq
    -1.44
     itſelf
    -1.41
    POSITIVE LOGITS
     afin
    0.85
     inorder
    0.82
     to
    0.72
     order
    0.67
     To
    0.62
     Cio
    0.60
     أجل
    0.59
    inder
    0.59
    inorder
    0.59
    Afin
    0.59
    Act Density 0.056%

    No Known Activations