INDEX
    Explanations

    instances of the word "know" across various contexts

    New Auto-Interp
    Negative Logits
    InstrumentedTest
    -0.40
     חיצוניים
    -0.40
    heds
    -0.39
    complexContent
    -0.37
    TemporalType
    -0.37
    jspb
    -0.37
     jambe
    -0.36
     belang
    -0.36
    使其
    -0.36
     occurrence
    -0.35
    POSITIVE LOGITS
     know
    0.92
    know
    0.83
    Know
    0.80
     Know
    0.79
     KNOW
    0.77
    knows
    0.75
    我知道
    0.71
     knows
    0.69
     Knew
    0.68
     знаю
    0.64
    Act Density 0.012%

    No Known Activations