INDEX
Explanations
military honorifics and awards
New Auto-Interp
Negative Logits
909
-0.15
Levi
-0.15
師
-0.14
954
-0.14
verg
-0.13
dilig
-0.13
-opacity
-0.13
Transparency
-0.13
Alta
-0.13
شب
-0.13
POSITIVE LOGITS
citation
0.26
citations
0.25
Citation
0.24
Purple
0.23
Streamer
0.23
ribbon
0.22
Ribbon
0.22
rib
0.21
decoration
0.20
oak
0.20
Activations Density 0.018%