INDEX
    Explanations

    instances of the word "invisible."

    New Auto-Interp
    Head Attr Weights
    0:0.03
    1:0.02
    2:0.06
    3:0.06
    4:0.11
    5:0.04
    6:0.05
    7:0.29
    8:0.04
    9:0.04
    10:0.10
    11:0.09
    Negative Logits
    artifacts
    -1.90
    emouth
    -1.75
    rador
    -1.63
    erion
    -1.60
    wagen
    -1.49
    oyd
    -1.49
    overe
    -1.49
    foundland
    -1.48
    outheast
    -1.48
    natureconservancy
    -1.46
    POSITIVE LOGITS
    until
    1.43
     MIT
    1.41
     Decay
    1.36
     markup
    1.31
     Nak
    1.31
     compute
    1.30
     computation
    1.28
     Payton
    1.27
     Dir
    1.26
     introduction
    1.25
    Act Density 0.001%

    No Known Activations