INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.14
     for
    1.11
    ب
    0.99
    C
    0.93
    :
    0.88
     Clon
    0.82
     parar
    0.80
    .’
    0.78
     handler
    0.77
     disse
    0.77
    POSITIVE LOGITS
    тся
    1.05
    ла
    1.00
     dazz
    1.00
     sparkles
    0.95
    ку
    0.94
    я
    0.94
    0.93
     dazzling
    0.88
    to
    0.87
    то
    0.84
    Act Density 0.005%

    No Known Activations