INDEX
    Explanations

    phrases that convey comparisons or contrasts

    New Auto-Interp
    Negative Logits
    yna
    -0.19
     mand
    -0.17
    yn
    -0.17
    rought
    -0.16
    ynes
    -0.16
    mand
    -0.15
     mes
    -0.15
    uw
    -0.15
    odium
    -0.14
    calar
    -0.14
    POSITIVE LOGITS
    iver
    0.14
    'options
    0.14
    insky
    0.14
    istrovstvÃŃ
    0.14
    erras
    0.13
    leo
    0.13
     Undefined
    0.13
     háºŃu
    0.13
    ợi
    0.13
    çīĩ
    0.13
    Act Density 0.075%

    No Known Activations