INDEX
    Explanations

    the infinitive form of verbs indicating purpose or intent

    New Auto-Interp
    Negative Logits
    ImageContext
    -0.74
    UnusedPrivate
    -0.68
    İstinadlar
    -0.68
    principalTable
    -0.67
    GenerationType
    -0.64
    PerformLayout
    -0.64
    velopes
    -0.63
    ensement
    -0.61
    nocześnie
    -0.60
     Superhosts
    -0.59
    POSITIVE LOGITS
    To
    1.56
     To
    1.47
     Toh
    0.97
     Чтобы
    0.95
     TO
    0.94
    TO
    0.92
     Để
    0.85
    为了
    0.84
    Чтобы
    0.84
    For
    0.74
    Act Density 0.140%

    No Known Activations