INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    m
    1.09
     was
    1.09
    are
    1.08
     are
    1.04
    was
    1.03
    it
    1.02
    p
    0.98
    ،
    0.96
    ד
    0.94
    methyl
    0.93
    POSITIVE LOGITS
    3
    1.30
    4
    1.11
    8
    1.10
    5
    1.09
    9
    1.09
    7
    1.08
    6
    1.03
     grumpy
    0.97
    З
    0.97
    0.96
    Act Density 0.065%

    No Known Activations