INDEX
    Explanations

    phrases indicating intentions or actions aimed at achieving specific goals

    New Auto-Interp
    Negative Logits
     ſtate
    -0.89
     ftate
    -0.89
     Majefty
    -0.83
     Efq
    -0.80
     fubject
    -0.79
     houſe
    -0.79
     pleaſure
    -0.78
     whoſe
    -0.77
     chofe
    -0.76
     poffe
    -0.73
    POSITIVE LOGITS
     Để
    0.93
     чтобы
    0.88
    为了
    0.86
     כדי
    0.85
     afin
    0.85
    Cyfeiriadau
    0.82
    Để
    0.82
    ůli
    0.81
     Чтобы
    0.80
    為了
    0.79
    Act Density 0.096%

    No Known Activations