INDEX
    Explanations

    prepositions

    New Auto-Interp
    Negative Logits
     ingr
    -0.06
    -0.06
     actionTypes
    -0.06
    bert
    -0.06
    yk
    -0.06
     Fisher
    -0.06
    Minus
    -0.06
     Damascus
    -0.06
    Direction
    -0.06
     Libyan
    -0.06
    POSITIVE LOGITS
    лей
    0.07
    تان
    0.06
    _DOT
    0.06
     Canvas
    0.06
     söylem
    0.06
     Pist
    0.06
    Stage
    0.06
    έρει
    0.06
    _rs
    0.06
     @@↵
    0.06
    Act Density 0.314%

    No Known Activations