INDEX
    Explanations

    words associated with measuring and analyzing performance or conditions

    New Auto-Interp
    Negative Logits
     the
    -0.87
    -0.78
     can
    -0.76
     also
    -0.76
     he
    -0.75
     have
    -0.73
     about
    -0.73
     all
    -0.73
     it
    -0.72
     are
    -0.71
    POSITIVE LOGITS
     enfans
    0.92
     Theſe
    0.89
     feroit
    0.88
     Houſe
    0.88
     pngtree
    0.88
     ſche
    0.88
     itſelf
    0.87
     myſelf
    0.86
     avoient
    0.85
     igång
    0.84
    Act Density 4.697%

    No Known Activations