INDEX
    Explanations

    universal applications and concepts

    New Auto-Interp
    Negative Logits
    t
    0.65
    o
    0.57
    s
    0.54
    r
    0.54
    و
    0.52
    整体
    0.51
     разно
    0.51
    m
    0.50
     diversité
    0.48
     разнообраз
    0.48
    POSITIVE LOGITS
    Universal
    0.91
     universal
    0.88
     universally
    0.77
    universal
    0.74
     Universal
    0.73
     универса
    0.62
     UNIVERS
    0.61
     यूनिवर्सल
    0.57
     universality
    0.56
    nivers
    0.53
    Act Density 0.023%

    No Known Activations