INDEX
    Explanations

    occurrences of the word "of."

    New Auto-Interp
    Negative Logits
     partName
    -0.70
     appra
    -0.67
     FW
    -0.65
     passers
    -0.64
     MacArthur
    -0.62
     ridic
    -0.59
     Prairie
    -0.58
     multiplication
    -0.57
     Extras
    -0.57
     CBI
    -0.56
    POSITIVE LOGITS
    sky
    1.28
    rontal
    1.17
    ield
    1.16
    lav
    1.14
    milo
    1.03
    ortunately
    1.02
    rame
    1.01
    ski
    0.98
    rio
    0.98
    icial
    0.98
    Act Density 0.026%

    No Known Activations