INDEX
    Explanations

    phrases indicating understanding or familiarity with a subject

    New Auto-Interp
    Negative Logits
    ingen
    -0.16
    unker
    -0.15
    andum
    -0.15
    ÑıÑĩ
    -0.14
    áš
    -0.14
    .bool
    -0.14
    ÙĤÙĬ
    -0.13
    Verification
    -0.13
    éħ
    -0.13
    lix
    -0.13
    POSITIVE LOGITS
     understanding
    0.50
     know
    0.50
     knowledge
    0.49
     knows
    0.48
     understand
    0.48
     understands
    0.48
    çŁ¥éģĵ
    0.42
     knowing
    0.42
     knew
    0.42
     KNOW
    0.41
    Act Density 0.425%

    No Known Activations