INDEX
    Explanations

    expressions related to value and worthiness

    New Auto-Interp
    Negative Logits
     Values
    -0.18
    Values
    -0.16
    onta
    -0.16
     values
    -0.15
    values
    -0.15
     Whe
    -0.14
    elli
    -0.14
    -values
    -0.14
     du
    -0.14
     ras
    -0.14
    POSITIVE LOGITS
     worth
    0.88
     Worth
    0.77
    worth
    0.71
     worthwhile
    0.52
    sworth
    0.38
     worthy
    0.37
    worthy
    0.32
    orth
    0.31
    ORTH
    0.31
    -worthy
    0.30
    Act Density 0.109%

    No Known Activations