INDEX
    Explanations

    comparative phrases or contrasting ideas

    New Auto-Interp
    Negative Logits
     매우
    -0.53
     Dernière
    -0.53
    غه
    -0.52
     eneste
    -0.52
    //
    -0.50
     einzig
    -0.50
    consulté
    -0.49
    -0.49
    ardes
    -0.48
     &(
    -0.47
    POSITIVE LOGITS
     safer
    1.10
     stronger
    1.03
     richer
    1.02
     healthier
    1.00
     happier
    0.99
     clearer
    0.98
     slower
    0.98
     higher
    0.96
     harder
    0.95
     stiffer
    0.95
    Act Density 0.727%

    No Known Activations