INDEX
    Explanations

    foreign languages

    New Auto-Interp
    Negative Logits
     (
    -0.08
    1
    -0.08
    21
    -0.08
     Mustang
    -0.07
    ':['
    -0.07
    ++,
    -0.07
     '-
    -0.07
    5
    -0.07
     bằng
    -0.07
    11
    -0.07
    POSITIVE LOGITS
    	child
    0.10
    ドル
    0.07
     adicion
    0.07
     MAN
    0.07
    0.06
     Certif
    0.06
     Rape
    0.06
     vent
    0.06
    ded
    0.06
    gorith
    0.06
    Act Density 0.399%

    No Known Activations