INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ب
    1.27
    1.21
    م
    1.12
    G
    1.11
    ي
    1.05
    D
    1.03
    י
    1.03
    وم
    1.00
    0.99
    C
    0.98
    POSITIVE LOGITS
     as
    1.36
    ↵↵
    0.95
    я
    0.91
     turpentine
    0.82
     sexually
    0.80
     for
    0.72
    го
    0.71
    ти
    0.71
    ing
    0.69
    o
    0.69
    Act Density 0.070%

    No Known Activations