INDEX
    Explanations

    numerical representations or references that quantify specific information

    New Auto-Interp
    Negative Logits
     himſelf
    -0.85
     myſelf
    -0.79
     raiſ
    -0.78
     themſelves
    -0.76
     purpoſe
    -0.72
     ſmall
    -0.72
     defaultstate
    -0.70
     tranſ
    -0.69
     itſelf
    -0.69
     ſtate
    -0.69
    POSITIVE LOGITS
    7
    1.03
    9
    1.02
    5
    1.02
    6
    1.01
    8
    1.00
    4
    0.99
    3
    0.96
    0
    0.93
    2
    0.91
    1
    0.89
    Act Density 1.446%

    No Known Activations