INDEX
    Explanations

    encouragement for further action

    New Auto-Interp
    Negative Logits
    ing
    1.19
    ang
    1.19
    ि
    1.09
    ong
    1.07
     It
    1.05
     are
    1.02
    ter
    1.00
    ោក
    0.98
    0.97
    h
    0.96
    POSITIVE LOGITS
    1.55
    ,
    1.43
    ר
    1.43
    ;
    1.42
    1.39
    р
    1.35
    4
    1.30
    י
    1.30
    5
    1.29
    6
    1.29
    Act Density 0.176%

    No Known Activations