INDEX
    Explanations

    words or phrases related to reasoning or justification

    words related to rationality and justification

    New Auto-Interp
    Negative Logits
    sten
    -0.66
    raped
    -0.65
    etry
    -0.63
    cutting
    -0.63
     Sina
    -0.62
    chi
    -0.61
     blackout
    -0.59
    facts
    -0.58
    pees
    -0.58
    charged
    -0.57
    POSITIVE LOGITS
     insofar
    0.85
     concern
    0.77
     altru
    0.72
     curiosity
    0.72
     intu
    0.70
    arily
    0.69
     indignation
    0.68
     sympath
    0.68
     applaud
    0.66
    ;;;;;;;;;;;;
    0.65
    Act Density 0.233%

    No Known Activations