INDEX
    Explanations

    mentions of numerical values and measurements

    capitalized names and significant phrases

    New Auto-Interp
    Negative Logits
     metic
    -0.86
     authoritative
    -0.77
     ÂŃ
    -0.76
     inspected
    -0.75
     ALEC
    -0.74
     Airbnb
    -0.74
     laboratories
    -0.73
     Layer
    -0.73
     evaluated
    -0.72
     BART
    -0.72
    POSITIVE LOGITS
    and
    1.43
    stre
    1.43
    nor
    1.41
    him
    1.41
    requ
    1.40
    kn
    1.40
    cond
    1.39
    from
    1.38
    comm
    1.38
    while
    1.38
    Act Density 0.255%

    No Known Activations