INDEX
    Explanations

    references to entities or groups, particularly related to demographics or categories

    New Auto-Interp
    Negative Logits
    utils
    -0.69
    nings
    -0.64
    atoes
    -0.62
    few
    -0.62
    pse
    -0.62
     guiName
    -0.62
    arde
    -0.61
     preparations
    -0.60
    taker
    -0.60
    xxxxxxxx
    -0.60
    POSITIVE LOGITS
     differing
    1.01
     varying
    0.93
     color
    0.89
     colour
    0.82
     various
    0.81
     different
    0.79
     diverse
    0.78
     other
    0.77
     varied
    0.76
     Colour
    0.73
    Act Density 0.288%

    No Known Activations