INDEX
    Explanations

    references to logical reasoning and arguments

    references to logic, particularly in legal and philosophical contexts

    New Auto-Interp
    Negative Logits
    semble
    -0.70
    orks
    -0.70
     Volunte
    -0.69
    avez
    -0.68
    ometown
    -0.68
    hold
    -0.65
    eneg
    -0.65
    lain
    -0.64
    affer
    -0.64
    Shar
    -0.63
    POSITIVE LOGITS
     logic
    1.31
     Logic
    0.98
     droid
    0.84
    istically
    0.83
     appl
    0.81
    matical
    0.76
    ynes
    0.76
     idi
    0.72
     reasoning
    0.71
    matically
    0.71
    Act Density 0.007%

    No Known Activations