INDEX
    Explanations

    structural elements or markers in the text, such as brackets or punctuation

    New Auto-Interp
    Head Attr Weights
    0:0.09
    1:0.23
    2:0.08
    3:0.09
    4:0.06
    5:0.03
    6:0.10
    7:0.06
    8:0.03
    9:0.04
    10:0.05
    11:0.08
    Negative Logits
    Si
    -3.14
     Mour
    -2.96
     Machines
    -2.77
    ô
    -2.62
    Reviewer
    -2.60
    Johnson
    -2.56
    separ
    -2.52
     Popular
    -2.52
    cho
    -2.49
     JO
    -2.48
    POSITIVE LOGITS
     baseline
    5.72
     Unch
    3.80
    base
    3.56
    elines
    3.43
     normalized
    3.42
     peanuts
    3.08
     STAND
    2.94
    bas
    2.92
    ummies
    2.85
     base
    2.82
    Act Density 0.000%

    No Known Activations