INDEX
    Explanations

    specific numerical data and references in the text

    New Auto-Interp
    Head Attr Weights
    0:0.02
    1:0.01
    2:0.02
    3:0.15
    4:0.02
    5:0.03
    6:0.05
    7:0.02
    8:0.02
    9:0.01
    10:0.56
    11:0.03
    Negative Logits
    %.
    -2.83
    '."
    -2.67
    '.
    -2.61
    .'"
    -2.60
     };
    -2.58
    .」
    -2.57
    .</
    -2.47
    .",
    -2.40
    >.
    -2.40
    !".
    -2.31
    POSITIVE LOGITS
    )
    3.75
     )
    3.38
    /)
    3.14
    -)
    3.06
    )"
    2.86
    %)
    2.74
    ?)
    2.73
    )'
    2.73
    !)
    2.55
    ())
    2.51
    Act Density 0.311%

    No Known Activations