INDEX
    Explanations

    questions or statements involving knowledge or information

    references to guessing and knowledge-based questioning

    New Auto-Interp
    Negative Logits
    Contents
    -0.76
    egu
    -0.73
    UTC
    -0.71
    Dialogue
    -0.69
    imm
    -0.68
    itals
    -0.68
    egal
    -0.68
    enge
    -0.68
    roman
    -0.67
    ²¾
    -0.67
    POSITIVE LOGITS
     Oops
    0.80
     Pledge
    0.74
     Want
    0.74
     Won
    0.74
     Bike
    0.71
     Spice
    0.70
     naughty
    0.70
     kidding
    0.70
     Bastard
    0.69
     Didn
    0.68
    Act Density 0.319%

    No Known Activations