INDEX
    Explanations

    how, where, when, which describe actions

    New Auto-Interp
    Negative Logits
    Semua
    -1.00
    hichever
    -0.96
    centaje
    -0.94
     所有
    -0.93
     všet
    -0.91
    Yosh
    -0.89
    delivr
    -0.88
     всі
    -0.88
    lewood
    -0.87
    moncler
    -0.87
    POSITIVE LOGITS
     they
    1.55
     we
    1.23
     meals
    1.09
     goods
    0.99
    리를
    0.98
     you
    0.93
     these
    0.92
     something
    0.91
     इसे
    0.90
     można
    0.89
    Act Density 0.133%

    No Known Activations