INDEX
    Explanations

    references to knowledge and understanding in various contexts

    New Auto-Interp
    Negative Logits
    StructEnd
    -0.58
    AndEndTag
    -0.56
    WindowConstants
    -0.56
     Walkover
    -0.56
    SBATCH
    -0.55
    dymyr
    -0.55
    neko
    -0.53
    insegna
    -0.52
    jelent
    -0.52
    لاة
    -0.51
    POSITIVE LOGITS
     base
    1.01
    base
    0.96
     Knowledge
    0.86
    Knowledge
    0.85
     Base
    0.83
     gained
    0.83
     knowledge
    0.83
    knowledge
    0.82
     bases
    0.82
    bases
    0.81
    Act Density 0.088%

    No Known Activations