INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     teaching
    0.53
    কে
    0.52
    0.52
    0.51
    0.50
    0.50
     교육
    0.49
     تعليم
    0.49
    0.49
     Teaching
    0.49
    POSITIVE LOGITS
    t
    0.90
    r
    0.87
    et
    0.78
    ä
    0.76
    to
    0.69
    h
    0.68
    c
    0.65
    p
    0.65
    k
    0.65
    on
    0.64
    Act Density 0.002%

    No Known Activations