INDEX
    Explanations

    comparisons and evaluations related to quantity

    New Auto-Interp
    Negative Logits
    onut
    -0.77
    bons
    -0.72
    licts
    -0.71
    ividual
    -0.71
    network
    -0.71
    ggles
    -0.69
    prus
    -0.67
    ourn
    -0.67
    neys
    -0.67
    oké
    -0.65
    POSITIVE LOGITS
     understatement
    1.07
     description
    0.84
     exaggeration
    0.83
     bullshit
    0.83
     rhetorical
    0.79
     untrue
    0.79
     reassuring
    0.79
     explanation
    0.78
     characterization
    0.78
     conjecture
    0.77
    Act Density 0.186%

    No Known Activations