INDEX
    Explanations

    the word "Heritage" at various activation levels

    mentions of the Heritage Foundation

    New Auto-Interp
    Negative Logits
    redd
    -0.71
    agram
    -0.70
    tered
    -0.70
    orders
    -0.67
    vern
    -0.66
     unsub
    -0.66
    odiac
    -0.65
    gradient
    -0.65
    ching
    -0.65
    sie
    -0.63
    POSITIVE LOGITS
     Heritage
    1.16
    conservancy
    1.05
    itage
    0.89
     Foundation
    0.81
     Collection
    0.77
     Institutes
    0.74
    icity
    0.72
     Dictionary
    0.72
     Institute
    0.72
     Values
    0.71
    Act Density 0.008%

    No Known Activations