INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.35
    {
    1.32
    e
    1.31
    y
    1.29
    it
    1.20
     guilt
    1.20
    יים
    1.16
    %
    1.15
    י
    1.15
    _
    1.13
    POSITIVE LOGITS
    1.62
    اس
    1.58
    ен
    1.43
    нан
    1.38
    细节
    1.36
    ្ន
    1.35
    리에
    1.35
    ро
    1.34
    েল
    1.30
     closely
    1.30
    Act Density 0.153%

    No Known Activations