INDEX
    Explanations

    phrases or statements that indicate the fabrication or creation of stories

    New Auto-Interp
    Negative Logits
    luaj
    -0.74
    zik
    -0.68
    hens
    -0.65
    ridor
    -0.63
    Gi
    -0.63
    DEM
    -0.63
    Xi
    -0.62
    externalActionCode
    -0.61
    avery
    -0.61
    gnu
    -0.61
    POSITIVE LOGITS
    ulates
    0.82
     excuses
    0.81
    ulate
    0.73
    ulated
    0.70
    itional
    0.69
     excuse
    0.68
    ulations
    0.67
    iframe
    0.65
    ulating
    0.64
     stories
    0.62
    Act Density 0.021%

    No Known Activations