INDEX
    Explanations

    phrases related to instructions or sequential processes

    New Auto-Interp
    Negative Logits
     themſelves
    -1.02
     himſelf
    -0.99
     pleaſure
    -0.93
     myſelf
    -0.90
     Theſe
    -0.88
     BorderRadius
    -0.85
     poffe
    -0.84
     Jefus
    -0.84
     Monfieur
    -0.81
     leſs
    -0.80
    POSITIVE LOGITS
     inorder
    1.37
     afin
    1.21
     order
    1.12
    inorder
    1.08
    Afin
    1.04
    为了
    1.01
     чтобы
    0.95
     כדי
    0.92
    為了
    0.91
     Afin
    0.91
    Act Density 0.162%

    No Known Activations