INDEX
    Explanations

    the phrase 'common sense'

    references to the concept of common sense

    New Auto-Interp
    Negative Logits
    atern
    -0.77
    ETA
    -0.76
    etsk
    -0.75
    chrom
    -0.74
    Stars
    -0.72
    raph
    -0.71
    \/\/
    -0.69
    soon
    -0.67
    bye
    -0.67
    href
    -0.66
    POSITIVE LOGITS
     ACTIONS
    0.91
    smanship
    0.90
    pants
    0.76
    ensical
    0.73
    Cola
    0.70
     dictates
    0.69
     constraints
    0.66
    iness
    0.64
     imitation
    0.63
     Dynamics
    0.62
    Act Density 0.041%

    No Known Activations