INDEX
    Explanations

    questions introduced with the word "What"

    questions formulated with "What"

    New Auto-Interp
    Negative Logits
    shore
    -0.66
    lich
    -0.65
    general
    -0.64
    roads
    -0.64
    ulic
    -0.63
    Lago
    -0.63
    ability
    -0.63
    println
    -0.62
    gi
    -0.60
    raction
    -0.59
    POSITIVE LOGITS
    soever
    1.24
     happens
    1.04
     Lies
    0.98
     happened
    0.94
     distinguishes
    0.94
     transpired
    0.92
     Makes
    0.90
     separates
    0.88
     Difference
    0.85
     happ
    0.83
    Act Density 0.081%

    No Known Activations