INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ]
    1.15
    )
    1.06
     و
    1.02
    of
    0.99
    us
    0.98
    ä
    0.98
    >
    0.95
    _
    0.94
    og
    0.93
     stallions
    0.93
    POSITIVE LOGITS
    на
    1.48
    1.37
    м
    1.27
     in
    1.17
    ne
    1.15
    م
    1.13
    na
    1.11
    ين
    1.08
    la
    1.05
    ni
    1.04
    Act Density 0.008%

    No Known Activations