INDEX
    Explanations

    instances of the word "two" and its variants

    New Auto-Interp
    Negative Logits
    ly
    -1.01
    ñores
    -0.70
     argint
    -0.69
     ainfi
    -0.68
    er
    -0.66
    čaj
    -0.66
     vectorielles
    -0.66
     enfans
    -0.66
    mer
    -0.65
     vaisselle
    -0.64
    POSITIVE LOGITS
    Према
    0.97
     Two
    0.86
     dozen
    0.82
     CreateTagHelper
    0.82
    0.81
    Two
    0.81
     CWE
    0.81
    abetes
    0.79
     two
    0.78
     Twee
    0.77
    Act Density 0.123%

    No Known Activations