INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    dur
    -0.46
     Ellie
    -0.45
     הרי
    -0.45
    resco
    -0.44
    icose
    -0.43
    velli
    -0.43
    -0.43
    oneofs
    -0.42
     Vande
    -0.41
    oa̍t
    -0.41
    POSITIVE LOGITS
     Japan
    2.03
    Japan
    1.91
     Japón
    1.67
     japan
    1.66
     JAPAN
    1.60
    JAPAN
    1.44
    japan
    1.41
     Giappone
    1.41
     Japanese
    1.35
     Japão
    1.34
    Act Density 0.006%

    No Known Activations