INDEX
Explanations
objects located at the top of something
occurrences of the word "top" in various contexts
New Auto-Interp
Negative Logits
ellow
-0.75
ufact
-0.74
agra
-0.65
theless
-0.62
upon
-0.62
_-
-0.61
Inqu
-0.61
Leone
-0.60
gm
-0.60
fellow
-0.59
POSITIVE LOGITS
most
1.18
liest
0.82
iary
0.78
thereof
0.77
tier
0.73
Bottom
0.69
bottom
0.69
scorer
0.68
rope
0.67
top
0.66
Activations Density 0.062%