INDEX
    Explanations

    words indicating inclusivity or exceptions

    New Auto-Interp
    Negative Logits
     nakalista
    -0.72
     Paglinawan
    -0.59
    :✨
    -0.54
     Савезне
    -0.54
     ujednoznacz
    -0.52
     Италијани
    -0.52
    WebVitals
    -0.52
    tagHelperRunner
    -0.51
    -0.50
     мәкал
    -0.49
    POSITIVE LOGITS
     sekal
    0.59
    Even
    0.46
     Even
    0.46
     even
    0.44
     kahit
    0.42
    即便是
    0.41
    Même
    0.40
     Bahkan
    0.38
    Даже
    0.38
     experimentado
    0.38
    Act Density 0.389%

    No Known Activations