INDEX
Explanations
rankings or placements in various competitive contexts
New Auto-Interp
Negative Logits
top
-0.19
top
-0.18
erland
-0.17
topo
-0.17
eron
-0.16
rze
-0.16
_top
-0.16
TOP
-0.15
tops
-0.15
anz
-0.15
POSITIVE LOGITS
tier
0.26
-tier
0.26
most
0.25
pling
0.24
flight
0.23
ech
0.23
gross
0.21
-notch
0.21
dogs
0.21
Tier
0.21
Activations Density 0.027%