INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    1.35
    on
    0.98
    ren
    0.92
    for
    0.86
    ",
    0.84
     for
    0.84
    ts
    0.83
    2
    0.81
    ill
    0.80
    ten
    0.80
    POSITIVE LOGITS
    ى
    1.38
    ک
    1.05
     ۳
    1.01
    ன்
    0.99
    <start_of_turn>
    0.95
    0.93
    િ
    0.92
    0.91
    ่า
    0.90
    스의
    0.90
    Act Density 0.217%

    No Known Activations