INDEX
Explanations
proper nouns and specific names
New Auto-Interp
Negative Logits
Hob
-0.15
ụn
-0.14
Zhu
-0.12
رÙĪØ¯
-0.12
.zh
-0.11
Elsa
-0.11
Tibetan
-0.11
krv
-0.10
ë¶
-0.10
Spokane
-0.10
POSITIVE LOGITS
Cox
0.69
ox
0.66
ox
0.65
Sax
0.64
CX
0.64
Rex
0.63
Pax
0.62
FX
0.60
Lexington
0.60
Rox
0.60
Activations Density 1.272%