INDEX
    Explanations

    references to the specific term "Box"

    references to specific 'Box' categories or labels

    New Auto-Interp
    Negative Logits
    ittee
    -0.83
    ufact
    -0.80
    puter
    -0.78
    merce
    -0.75
    FUL
    -0.70
    asury
    -0.69
    ptoms
    -0.67
    opoulos
    -0.66
     beh
    -0.65
     CSI
    -0.65
    POSITIVE LOGITS
    es
    1.07
     Box
    1.01
    er
    0.98
    Box
    0.97
    wra
    0.97
    esy
    0.97
    sets
    0.96
    boxes
    0.94
    eers
    0.94
    box
    0.93
    Act Density 0.012%

    No Known Activations