INDEX
    Explanations

    the word "operations" and sometimes words associated with managing temperature of an object

    New Auto-Interp
    Negative Logits
    -0.73
    1
    -0.72
    2
    -0.71
    the
    -0.67
    5
    -0.65
      
    -0.61
    0
    -0.59
    3
    -0.59
    6
    -0.58
    ↵↵
    -0.58
    POSITIVE LOGITS
     purpoſe
    1.47
     ſtate
    1.43
     pleaſure
    1.37
     poffe
    1.35
     itſelf
    1.30
     juſt
    1.26
     reaſon
    1.25
     ainfi
    1.24
     themſelves
    1.24
     himſelf
    1.24
    Act Density 0.537%

    No Known Activations