INDEX
    Explanations

    evaluative phrases and descriptors

    New Auto-Interp
    Negative Logits
     Catal
    -0.16
     arg
    -0.16
    izin
    -0.15
     Schwarz
    -0.15
     Arg
    -0.15
    arg
    -0.15
    NullException
    -0.14
    ected
    -0.13
    anta
    -0.13
     Twin
    -0.13
    POSITIVE LOGITS
     description
    0.17
    uger
    0.15
    »
    0.15
    adesh
    0.15
    atch
    0.14
    .strict
    0.14
    description
    0.14
    ´
    0.14
     applied
    0.14
    -description
    0.14
    Act Density 0.072%

    No Known Activations