INDEX
    Explanations

    Negation/indefinite articles

    New Auto-Interp
    Negative Logits
    -0.07
     مم
    -0.06
    ê
    -0.06
     carne
    -0.06
    -0.06
     dq
    -0.06
    	Point
    -0.06
     ak
    -0.06
     руках
    -0.06
     delicious
    -0.06
    POSITIVE LOGITS
    ταν
    0.06
     humor
    0.06
    _quotes
    0.06
    Speech
    0.06
     Speech
    0.06
     обще
    0.06
     birçok
    0.06
     область
    0.06
     presenting
    0.06
    となる
    0.06
    Act Density 0.048%

    No Known Activations