INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ع
    2.00
    ق
    1.63
    га
    1.57
    א
    1.46
    ي
    1.44
    ه
    1.44
    ط
    1.43
    וס
    1.39
    هي
    1.37
    ܒ
    1.33
    POSITIVE LOGITS
    aching
    1.55
    来越
    1.47
    te
    1.45
    akers
    1.44
     foi
    1.42
    ts
    1.40
    elling
    1.39
    ms
    1.38
     d
    1.38
     praise
    1.37
    Act Density 0.006%

    No Known Activations