INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     nouveau
    1.68
    1.62
     ausschließlich
    1.60
     nouvel
    1.58
    𝘃
    1.56
    ̦
    1.49
    ்ட
    1.47
     таки
    1.47
    𝗺
    1.43
    1.43
    POSITIVE LOGITS
     able
    1.73
    ित
    1.63
    ists
    1.60
     admiration
    1.58
    ле
    1.58
    >)
    1.54
     steppe
    1.54
    amazing
    1.54
    1.53
     FHWA
    1.48
    Act Density 0.001%

    No Known Activations