INDEX
    Explanations

    sword and swordsmanship

    New Auto-Interp
    Negative Logits
    is
    0.92
    0.83
    ek
    0.83
    ли
    0.80
    it
    0.77
    ல்
    0.76
    al
    0.73
    اب
    0.70
    us
    0.69
    的具体
    0.69
    POSITIVE LOGITS
     sword
    0.73
    0.72
    Sword
    0.70
    ות
    0.66
     Sword
    0.65
     quente
    0.64
    ફેદ
    0.63
    あなたが
    0.63
     elenc
    0.62
     swords
    0.59
    Act Density 0.003%

    No Known Activations