INDEX
    Explanations

    phrases related to familiarity or common knowledge

    phrases or expressions indicating familiarity or common experiences

    New Auto-Interp
    Negative Logits
     Accessed
    -0.72
    sterdam
    -0.72
     smokes
    -0.67
     largeDownload
    -0.65
    afety
    -0.63
    uli
    -0.63
    ahead
    -0.62
    backs
    -0.61
    croft
    -0.60
    MORE
    -0.60
    POSITIVE LOGITS
    ggles
    1.00
    ilet
    0.74
    pper
    0.74
     contemplate
    0.73
     outsiders
    0.73
    wered
    0.72
     behold
    0.71
    asty
    0.70
    ADS
    0.70
     everyone
    0.70
    Act Density 0.178%

    No Known Activations