INDEX
    Explanations

    how to explain concepts

    New Auto-Interp
    Negative Logits
     نسب
    0.43
     misuse
    0.42
     medieval
    0.41
     need
    0.41
     bastard
    0.40
     uphold
    0.39
     pupils
    0.38
     accusation
    0.38
     protop
    0.38
     indire
    0.38
    POSITIVE LOGITS
    フォーマンス
    0.47
    ல்
    0.42
     Polytechnique
    0.42
    EnglishMarks
    0.42
    आरओ
    0.41
    0.40
    0.40
    0.40
    WinCounter
    0.39
    0.39
    Act Density 0.001%

    No Known Activations