INDEX
    Explanations

    your followed by a noun

    New Auto-Interp
    Negative Logits
     மற்றும்
    -1.58
    layah
    -1.45
    Suka
    -1.43
     않는
    -1.41
     sepeda
    -1.39
    我們
    -1.39
    Dzień
    -1.38
     그녀
    -1.38
    ขอบคุณ
    -1.37
    jaket
    -1.37
    POSITIVE LOGITS
    _
    2.09
    '
    1.98
    ,
    1.72
    1.61
    new
    1.61
    1.52
     \
    1.48
    ra
    1.48
    car
    1.47
    la
    1.45
    Act Density 0.029%

    No Known Activations