INDEX
    Explanations

    French language

    New Auto-Interp
    Negative Logits
     dubbed
    -0.08
    Chief
    -0.08
     nab
    -0.07
     Peg
    -0.07
    LEN
    -0.07
     мере
    -0.07
     Chief
    -0.07
     census
    -0.07
     Ai
    -0.07
    (me
    -0.07
    POSITIVE LOGITS
     hue
    0.08
     hib
    0.08
     mdi
    0.08
     contest
    0.07
     hc
    0.07
     Fernando
    0.07
     позитив
    0.07
     Contest
    0.07
     helder
    0.07
    vast
    0.07
    Act Density 0.003%

    No Known Activations