INDEX
    Explanations

    phrases indicating improvement or enhancement

    New Auto-Interp
    Negative Logits
    SequentialGroup
    -0.64
     coroa
    -0.57
     Palacios
    -0.52
     Schofield
    -0.52
    CardModule
    -0.50
     Carrasco
    -0.49
    skraft
    -0.49
     Carrillo
    -0.49
    sessionId
    -0.49
     Murdoch
    -0.48
    POSITIVE LOGITS
    better
    1.71
    Better
    1.70
     better
    1.66
     Better
    1.59
     BETTER
    1.50
     mejor
    1.26
     bessere
    1.19
     mieux
    1.15
     besseren
    1.14
     melhor
    1.13
    Act Density 0.017%

    No Known Activations