INDEX
    Explanations

    comparisons or choices between different entities

    comparisons or contrasts between two entities or ideas

    New Auto-Interp
    Negative Logits
    shire
    -0.79
    lied
    -0.77
    overed
    -0.72
    olog
    -0.71
    iola
    -0.71
    estone
    -0.71
    ERN
    -0.70
    ortal
    -0.70
    ogen
    -0.70
    YD
    -0.69
    POSITIVE LOGITS
    hill
    0.68
     theirs
    0.65
    pecting
    0.62
    creen
    0.60
     bandits
    0.60
     USPS
    0.59
     expecting
    0.59
     nil
    0.58
     await
    0.58
     hers
    0.57
    Act Density 0.020%

    No Known Activations