INDEX
Explanations
phrases related to rank or status
New Auto-Interp
Negative Logits
besten
-0.18
weaker
-0.17
coolest
-0.17
ìľĦ
-0.16
weir
-0.16
brightest
-0.15
erator
-0.15
æľĢçµĤ
-0.15
simplest
-0.15
finest
-0.15
POSITIVE LOGITS
third
0.22
second
0.20
joint
0.18
third
0.17
tied
0.17
joint
0.16
sixth
0.16
第ä¸ī
0.16
fourth
0.15
asal
0.14
Activations Density 0.075%