INDEX
    Explanations

    adjectives related to quality or performance

    terms related to mixed outcomes and varying quality

    New Auto-Interp
    Negative Logits
    dfx
    -0.73
     Encyclopedia
    -0.72
     smashed
    -0.61
     Ships
    -0.61
     RL
    -0.61
    oin
    -0.61
    enium
    -0.61
     Xan
    -0.61
     OM
    -0.61
     Moons
    -0.59
    POSITIVE LOGITS
     distingu
    0.72
    ikawa
    0.70
    conserv
    0.69
    6666
    0.67
    luster
    0.67
     dism
    0.67
    abwe
    0.65
     conduc
    0.65
     fared
    0.65
    Compar
    0.65
    Act Density 0.485%

    No Known Activations