INDEX
    Explanations

    multilingual and diverse concepts

    New Auto-Interp
    Negative Logits
     nicht
    1.13
     auch
    1.02
     allem
    0.96
     presque
    0.95
     asumir
    0.95
     menand
    0.95
     mwaka
    0.94
     donde
    0.93
     etwas
    0.93
     not
    0.92
    POSITIVE LOGITS
    1.01
    iculum
    1.00
    vation
    0.99
    Ди
    0.98
     Від
    0.98
    ה
    0.97
    0.96
    𝘿
    0.94
    dling
    0.93
    localhost
    0.93
    Act Density 0.011%

    No Known Activations