INDEX
    Explanations

    statements expressing support or validation

    phrases related to support or validation

    New Auto-Interp
    Negative Logits
    nel
    -0.75
    anny
    -0.73
    ities
    -0.70
     newsp
    -0.69
    pox
    -0.68
    inational
    -0.65
     Hebdo
    -0.65
     irony
    -0.64
    entric
    -0.63
     ILCS
    -0.63
    POSITIVE LOGITS
    raise
    0.75
    hard
    0.75
    track
    0.73
    byn
    0.73
    abies
    0.71
    drive
    0.69
    ament
    0.67
    taking
    0.66
    GROUND
    0.65
    lash
    0.65
    Act Density 0.038%

    No Known Activations