INDEX
    Explanations

    distinctions and characteristics in classifications

    New Auto-Interp
    Negative Logits
     salopes
    -0.17
    anou
    -0.15
    ár
    -0.14
    riad
    -0.14
     echang
    -0.14
     feder
    -0.14
    _Reference
    -0.13
    deb
    -0.13
    alia
    -0.13
    escort
    -0.13
    POSITIVE LOGITS
     rather
    0.23
     just
    0.20
     paradox
    0.19
    rather
    0.19
     implicit
    0.17
     plus
    0.17
     contra
    0.17
     plutôt
    0.17
     bien
    0.16
     intuit
    0.16
    Act Density 0.037%

    No Known Activations