INDEX
Explanations
the neuron lights up on salient content words — especially named entities, dates/numbers, and topic-specific keywords (important nouns/terms).
New Auto-Interp
Negative Logits
p
0.56
ंप
0.52
Mens
0.49
دور
0.47
Misc
0.46
Madness
0.45
त्मक
0.44
funktion
0.44
Measures
0.44
طلع
0.44
POSITIVE LOGITS
নি
0.54
}}
0.53
یسم
0.48
sudut
0.48
شي
0.47
tı
0.46
өлү
0.46
држа
0.45
warrantless
0.45
سه
0.45
Activations Density 1.041%