INDEX
    Explanations

    the presence of the word "block" in various contexts

    New Auto-Interp
    Negative Logits
     prest
    -0.83
     appropri
    -0.74
    tti
    -0.67
     reluct
    -0.66
    hospital
    -0.66
    ppe
    -0.66
     subdu
    -0.65
     toget
    -0.65
     composition
    -0.60
     seaf
    -0.60
    POSITIVE LOGITS
    able
    1.18
    ables
    1.07
    ages
    1.05
    ers
    1.05
    ishable
    0.98
    ances
    0.96
    ment
    0.94
    zee
    0.94
    ades
    0.93
    aded
    0.93
    Act Density 0.003%

    No Known Activations