INDEX
    Explanations

    foreign languages

    New Auto-Interp
    Negative Logits
    =yes
    -0.07
     Mickey
    -0.07
    Gs
    -0.07
    (Token
    -0.06
     работа
    -0.06
    SCORE
    -0.06
    <quote
    -0.06
     sincere
    -0.06
    ipes
    -0.06
    илання
    -0.06
    POSITIVE LOGITS
    	ON
    0.08
     ATV
    0.07
    .pix
    0.07
    0.06
    0.06
    -none
    0.06
     İst
    0.06
     HOW
    0.06
     Berlin
    0.06
    _MAY
    0.06
    Act Density 0.010%

    No Known Activations