INDEX
    Explanations

    sentences that express value or interest in various topics

    New Auto-Interp
    Negative Logits
     Italijanski
    -0.54
    CompleteListener
    -0.49
    hythm
    -0.48
     automatiques
    -0.46
    haviours
    -0.46
     boisson
    -0.45
    getClassLoader
    -0.44
     esperienze
    -0.43
    ợt
    -0.43
    haviors
    -0.42
    POSITIVE LOGITS
     useful
    0.92
     worth
    0.92
     valuable
    0.91
     usefulness
    0.90
     value
    0.87
    valuable
    0.86
    worth
    0.86
    Worth
    0.82
    useful
    0.82
     WORTH
    0.81
    Act Density 0.346%

    No Known Activations