INDEX
    Explanations

    phrases indicating contrasts or alternatives in contexts

    New Auto-Interp
    Negative Logits
    oria
    -0.18
    erable
    -0.16
    ior
    -0.15
    raki
    -0.15
    νοÏį
    -0.15
    rak
    -0.14
    istant
    -0.14
    odies
    -0.14
    ijing
    -0.14
     Cul
    -0.14
    POSITIVE LOGITS
    ISCO
    0.15
    endar
    0.14
    ileaks
    0.14
     nÄĥ
    0.13
    878
    0.13
    íĦ¸
    0.13
    assist
    0.13
     spending
    0.13
    lom
    0.13
    iao
    0.13
    Act Density 0.907%

    No Known Activations