INDEX
Explanations
words related to hierarchy or position
instances of the word "top" used in various contexts
New Auto-Interp
Negative Logits
signed
-0.61
@@
-0.59
gm
-0.58
ouri
-0.56
edIn
-0.54
ventions
-0.54
inea
-0.52
greg
-0.52
gerald
-0.52
rw
-0.52
POSITIVE LOGITS
most
1.08
level
0.91
tier
0.88
thereof
0.86
of
0.85
tiers
0.85
edge
0.84
levels
0.80
shelf
0.78
mast
0.78
Activations Density 0.047%