INDEX
    Explanations

    punctuation and formatting variations in text

    New Auto-Interp
    Negative Logits
    à¥Ģà¤Ľ
    -0.17
     overall
    -0.15
     sadly
    -0.14
     incident
    -0.14
    ariant
    -0.13
     exc
    -0.13
    ANO
    -0.13
     later
    -0.13
    olders
    -0.13
    Overall
    -0.13
    POSITIVE LOGITS
     Enter
    0.45
    Enter
    0.44
     enter
    0.41
     Fortunately
    0.38
    .enter
    0.36
    .Enter
    0.35
     Luckily
    0.35
     enters
    0.35
    Fortunately
    0.35
    enter
    0.34
    Act Density 0.260%

    No Known Activations