INDEX
    Explanations

    less (in different languages)

    New Auto-Interp
    Negative Logits
    =True
    -0.08
     ALWAYS
    -0.08
     täi
    -0.07
     Unauthorized
    -0.07
     =================================
    -0.07
    -0.06
     forbid
    -0.06
    houette
    -0.06
    elyn
    -0.06
    -0.06
    POSITIVE LOGITS
     less
    0.59
     menos
    0.55
     weniger
    0.54
     moins
    0.51
     Less
    0.51
    Less
    0.50
     fewer
    0.45
     меньше
    0.44
     mniej
    0.44
     vähem
    0.43
    Act Density 0.256%

    No Known Activations