INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.65
    1.45
    ان
    1.42
    ات
    1.32
    <0x80>
    1.31
    1.22
    1.19
    O
    1.17
    F
    1.16
    ر
    1.16
    POSITIVE LOGITS
    ء
    1.45
    𝒽
    1.43
    𝓌
    1.39
    THING
    1.30
    1.27
    𝓈
    1.25
    τ
    1.24
    你們
    1.20
    𝒹
    1.20
    1.19
    Act Density 0.442%

    No Known Activations