INDEX
    Explanations

    the neuron lights up on salient content words — especially named entities, dates/numbers, and topic-specific keywords (important nouns/terms).

    New Auto-Interp
    Negative Logits
    p
    0.56
    ंप
    0.52
     Mens
    0.49
     دور
    0.47
     Misc
    0.46
     Madness
    0.45
    त्मक
    0.44
     funktion
    0.44
     Measures
    0.44
    طلع
    0.44
    POSITIVE LOGITS
    নি
    0.54
    }}
    0.53
    یسم
    0.48
     sudut
    0.48
    شي
    0.47
    0.46
    өлү
    0.46
     држа
    0.45
     warrantless
    0.45
    سه
    0.45
    Act Density 1.041%

    No Known Activations