INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    i
    1.09
    a
    1.02
    an
    0.96
    u
    0.82
    }=\
    0.77
    as
    0.76
    0.76
    t
    0.73
    j
    0.72
    ي
    0.70
    POSITIVE LOGITS
     Thor
    0.94
    Thor
    0.89
     THOR
    0.76
    thor
    0.73
     thoracic
    0.70
     Loki
    0.70
    0.66
     viking
    0.66
     thor
    0.66
     pouvoir
    0.63
    Act Density 0.001%

    No Known Activations