INDEX
    Explanations

    positive adjectives that convey improvement or enhancement

    New Auto-Interp
    Negative Logits
     similarity
    -0.62
    SpaceEngineers
    -0.62
    atum
    -0.60
     similarities
    -0.58
    stanbul
    -0.56
    alan
    -0.56
    rat
    -0.55
    athi
    -0.55
    utical
    -0.54
     authorization
    -0.53
    POSITIVE LOGITS
    terday
    0.75
    ible
    0.72
     anew
    0.70
    Enlarge
    0.66
     again
    0.65
    ISH
    0.65
    nell
    0.65
    enged
    0.64
    TER
    0.63
    IRED
    0.62
    Act Density 0.078%

    No Known Activations