INDEX
    Explanations

    concepts and their descriptions

    New Auto-Interp
    Negative Logits
    いました
    0.82
    Him
    0.78
    tidak
    0.77
     начать
    0.74
    him
    0.74
     него
    0.74
    তবে
    0.74
    وه
    0.74
    வதில்லை
    0.73
    把他
    0.72
    POSITIVE LOGITS
     involved
    1.46
     they
    1.39
     needed
    1.29
     we
    1.26
     required
    1.23
     used
    1.20
     that
    1.15
     mentioned
    1.14
     being
    1.13
     she
    1.12
    Act Density 1.659%

    No Known Activations