INDEX
    Explanations

    instances of the word 'Truth'

    occurrences of an end-of-text token

    New Auto-Interp
    Negative Logits
    ounter
    -0.87
    ITIES
    -0.76
    ATIONAL
    -0.74
    astical
    -0.73
     chem
    -0.69
     evid
    -0.69
    âĶĢâĶĢ
    -0.67
    iners
    -0.67
    ATED
    -0.66
    AMES
    -0.65
    POSITIVE LOGITS
    ful
    1.04
     Force
    0.96
    Works
    0.94
    bilt
    0.90
     Control
    0.88
     Machine
    0.87
     Girl
    0.85
    bringer
    0.85
     Matters
    0.85
     Sisters
    0.85
    Act Density 0.125%

    No Known Activations