INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ٣
    0.56
    ٣
    0.55
     surfboard
    0.50
     chestnuts
    0.49
     corros
    0.48
    不會
    0.48
    '".
    0.48
     mors
    0.47
     داله
    0.47
     currants
    0.47
    POSITIVE LOGITS
    L
    0.82
    R
    0.76
    Y
    0.75
    t
    0.72
    O
    0.71
    C
    0.70
    B
    0.69
    V
    0.69
    S
    0.68
    aw
    0.67
    Act Density 0.035%

    No Known Activations