INDEX
    Explanations

    words and phrases related to correct or ideal practices

    New Auto-Interp
    Negative Logits
    elic
    -0.16
       
    -0.16
    arine
    -0.16
    istics
    -0.15
    IX
    -0.15
    elder
    -0.15
    AZY
    -0.15
    /or
    -0.15
    rop
    -0.15
     possible
    -0.14
    POSITIVE LOGITS
     functioning
    0.22
    proper
    0.21
    -function
    0.20
     Proper
    0.20
     nouns
    0.20
    ity
    0.18
    ment
    0.17
    noun
    0.17
    fully
    0.17
    bred
    0.17
    Act Density 0.030%

    No Known Activations