INDEX
    Explanations

    phrases describing specific types or instances of items or concepts

    phrases that introduce examples or instances

    New Auto-Interp
    Negative Logits
    ombat
    -0.73
    ribution
    -0.72
    antage
    -0.66
    dollar
    -0.66
    orem
    -0.65
    ushima
    -0.63
    emi
    -0.63
    iliate
    -0.62
    Cause
    -0.60
     Bore
    -0.60
    POSITIVE LOGITS
    ties
    0.79
    cond
    0.74
    things
    0.70
     Osw
    0.66
    odon
    0.65
    types
    0.61
    requ
    0.61
     embodiments
    0.61
    necess
    0.60
     prec
    0.60
    Act Density 0.030%

    No Known Activations