INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -3.78
     But
    -3.70
     แต่
    -3.38
    -2.98
    -2.97
    -2.94
    ݈
    -2.86
     nhưng
    -2.81
    -2.81
    -2.73
    POSITIVE LOGITS
    0
    4.28
    b
    3.19
    get
    3.00
    8
    2.84
    s
    2.80
    y
    2.78
    6
    2.75
     gets
    2.66
    g
    2.63
    your
    2.58
    Act Density 0.002%

    No Known Activations