INDEX
    Explanations

    anticipating questions or objections

    New Auto-Interp
    Negative Logits
    д
    1.21
    с
    1.12
    ти
    1.05
    د
    0.98
    ime
    0.91
    ier
    0.90
    nesses
    0.90
    ک
    0.89
    iden
    0.88
    या
    0.87
    POSITIVE LOGITS
    1.17
    p
    1.16
    0
    1.16
    l
    1.15
    w
    1.13
    1.13
    FOR
    1.12
    3
    1.11
    צ
    1.10
    1.10
    Act Density 0.005%

    No Known Activations