INDEX
    Explanations

    phrases that introduce exceptions or limitations

    New Auto-Interp
    Negative Logits
    heimer
    -0.16
    igy
    -0.15
    kara
    -0.15
    hiba
    -0.15
    yor
    -0.15
    aling
    -0.14
     Vice
    -0.14
    zÃŃ
    -0.14
    yar
    -0.14
    ÑĤин
    -0.14
    POSITIVE LOGITS
    ortho
    0.17
    ottie
    0.15
    enses
    0.15
    etten
    0.15
    Ïħγ
    0.14
    ÙĨس
    0.14
    енз
    0.14
    ender
    0.13
    rier
    0.13
    Dlg
    0.13
    Act Density 0.009%

    No Known Activations