INDEX
    Explanations

    instances of words related to strong reasoning or arguments

    expressions of strong interest or persuasive arguments

    New Auto-Interp
    Negative Logits
    hops
    -0.92
    sterdam
    -0.77
    keepers
    -0.69
    paces
    -0.69
    chie
    -0.67
    hop
    -0.66
    isites
    -0.63
    clair
    -0.63
    usterity
    -0.62
    keeper
    -0.62
    POSITIVE LOGITS
    ibly
    1.09
     proofs
    0.95
     evidence
    0.93
     arguments
    0.91
    ingly
    0.91
     argument
    0.90
    ly
    0.88
    ible
    0.87
     proof
    0.86
    evidence
    0.86
    Act Density 0.104%

    No Known Activations