INDEX
    Explanations

    words related to incorrect information or judgments

    negative assessments or criticisms of concepts and arguments

    New Auto-Interp
    Negative Logits
    interrupted
    -0.96
     downed
    -0.72
    inder
    -0.67
    rolled
    -0.67
    runners
    -0.65
    illas
    -0.64
    gins
    -0.62
    rollers
    -0.62
    hens
    -0.61
     disbanded
    -0.60
    POSITIVE LOGITS
     insofar
    0.86
     simplistic
    0.84
    headed
    0.82
     analogy
    0.76
     extrap
    0.76
     rhetorical
    0.75
    ctive
    0.75
     empir
    0.75
     underest
    0.75
     logic
    0.74
    Act Density 0.211%

    No Known Activations