INDEX
    Explanations

    phrases referring to ways in which actions or scenarios are compared or related

    comparisons that express similarity or analogy between different subjects or concepts

    New Auto-Interp
    Negative Logits
    igh
    -0.73
     mun
    -0.62
    ONSORED
    -0.60
    Throw
    -0.59
    eri
    -0.58
    ategor
    -0.57
     McGee
    -0.56
    bart
    -0.56
     compe
    -0.55
    throw
    -0.55
    POSITIVE LOGITS
    ettings
    0.79
     rapists
    0.70
    ounter
    0.70
    isSpecialOrderable
    0.69
    achu
    0.69
    abl
    0.68
    liness
    0.65
    ractor
    0.64
    aws
    0.61
     Cooke
    0.61
    Act Density 0.045%

    No Known Activations