INDEX
Explanations
proper nouns and specific places
specific named entities and concepts
New Auto-Interp
Negative Logits
8
0.63
9
0.54
ز
0.52
4
0.50
N
0.48
في
0.48
THR
0.48
or
0.47
6
0.47
im
0.47
POSITIVE LOGITS
was
0.67
ਿੱ
0.54
;}
0.50
Passover
0.50
Gettysburg
0.50
Presiden
0.48
coda
0.48
EEOC
0.48
President
0.47
Isra
0.47
Activations Density 0.675%