INDEX
    Explanations

    instances of the word "acting."

    New Auto-Interp
    Negative Logits
    ↵↵
    -0.70
    -0.67
    <eos>
    -0.61
    -0.59
     The
    -0.56
    .
    -0.54
    ,
    -0.53
     And
    -0.53
     (
    -0.52
      
    -0.51
    POSITIVE LOGITS
     acted
    1.68
     acting
    1.64
     act
    1.57
    acting
    1.51
    Acting
    1.50
    cted
    1.50
    acts
    1.46
     Acting
    1.43
    act
    1.40
     ACT
    1.39
    Act Density 0.073%

    No Known Activations