INDEX
    Explanations

    names indicated by the prefix "De"

    New Auto-Interp
    Negative Logits
     Dhabi
    -0.74
     Romanian
    -0.72
    âĶĢâĶĢ
    -0.68
     UA
    -0.66
     Ai
    -0.64
     Korra
    -0.64
     Croatian
    -0.63
     Kath
    -0.63
     Paulo
    -0.63
     Sina
    -0.62
    POSITIVE LOGITS
    antz
    0.93
    atch
    0.78
    idge
    0.74
    fork
    0.74
    steen
    0.74
    scl
    0.73
    ĵĺ
    0.72
    otta
    0.69
    zinski
    0.69
    hol
    0.68
    Act Density 0.115%

    No Known Activations