INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.39
    这个
    0.36
     همچنین
    0.34
    0.34
     ซึ่ง
    0.34
     যদি
    0.33
    這個
    0.33
     که
    0.31
     that
    0.31
    この
    0.31
    POSITIVE LOGITS
     통한
    0.33
     difficult
    0.33
     unsustainable
    0.32
     relatable
    0.30
     used
    0.29
     위한
    0.29
     stressful
    0.29
     인한
    0.29
     primeiras
    0.29
     traumatic
    0.28
    Act Density 1.807%

    No Known Activations