INDEX
    Explanations

    multi-lingual abstract concepts

    New Auto-Interp
    Negative Logits
     bao
    0.39
     promov
    0.36
     desta
    0.35
     verb
    0.35
     voz
    0.35
     non
    0.35
     motto
    0.35
     tomb
    0.34
     g
    0.34
     glorified
    0.34
    POSITIVE LOGITS
    ar
    0.54
    sthe
    0.53
    el
    0.52
    an
    0.50
    ות
    0.48
    н
    0.48
    y
    0.48
    al
    0.48
     ofthe
    0.48
    k
    0.47
    Act Density 0.001%

    No Known Activations