INDEX
    Explanations

    concepts or notions that are framed as "ideas" related to various topics

    New Auto-Interp
    Negative Logits
     Neutral
    -0.17
    Builders
    -0.15
    endor
    -0.14
    Neutral
    -0.14
    ir
    -0.14
     Barry
    -0.14
    оÑĢе
    -0.14
    emp
    -0.14
    cg
    -0.14
    vit
    -0.14
    POSITIVE LOGITS
     notion
    0.25
    idea
    0.23
     concept
    0.23
     idea
    0.23
     Idea
    0.22
     behind
    0.22
     notions
    0.20
    Concept
    0.18
     premise
    0.17
     concepts
    0.17
    Act Density 0.034%

    No Known Activations