INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Ç
    0.37
    ط
    0.37
    ว่า
    0.36
    гляд
    0.36
    وال
    0.35
    wym
    0.35
    0.35
    that
    0.35
    isms
    0.35
     kwamba
    0.35
    POSITIVE LOGITS
     want
    0.57
     desperately
    0.47
     quiero
    0.46
     partake
    0.46
     WANT
    0.44
     revenge
    0.43
     wants
    0.42
     quería
    0.42
     unbedingt
    0.41
    want
    0.40
    Act Density 0.023%

    No Known Activations