INDEX
    Explanations

    questions and introductory phrases that signal explanations or observations

    New Auto-Interp
    Head Attr Weights
    0:0.04
    1:0.01
    2:0.07
    3:0.05
    4:0.03
    5:0.11
    6:0.02
    7:0.03
    8:0.41
    9:0.03
    10:0.08
    11:0.05
    Negative Logits
     Kear
    -2.01
    acan
    -1.71
    venge
    -1.70
    prop
    -1.66
    yg
    -1.62
    yp
    -1.58
    wra
    -1.54
    aer
    -1.50
    bec
    -1.46
    arthed
    -1.42
    POSITIVE LOGITS
    ALSE
    1.90
    ulla
    1.87
    ModLoader
    1.79
    soDeliveryDate
    1.67
    earances
    1.61
     affirmative
    1.61
    nexus
    1.60
    microsoft
    1.59
    IDA
    1.54
    odka
    1.54
    Act Density 0.073%

    No Known Activations