INDEX
    Explanations

    the word "all" with a high level of activation

    New Auto-Interp
    Negative Logits
    IDS
    -0.66
    SHIP
    -0.63
    bledon
    -0.62
     Kamp
    -0.62
    sofar
    -0.61
     Caption
    -0.61
     KH
    -0.60
     Provision
    -0.59
    plin
    -0.58
    abwe
    -0.57
    POSITIVE LOGITS
    igator
    1.29
    ocating
    1.23
    usion
    1.13
    igators
    1.12
    usions
    1.05
    uring
    1.04
    usive
    1.00
    iance
    1.00
    iances
    0.99
    edged
    0.98
    Act Density 0.032%

    No Known Activations