INDEX
Explanations
descriptive adjectives or specific terms
New Auto-Interp
Negative Logits
Y
1.18
M
1.15
Z
1.13
P
1.08
E
1.07
F
1.07
T
1.07
R
1.07
K
1.06
Z
1.04
POSITIVE LOGITS
socalled
1.27
famously
1.27
cosidd
1.27
sogen
1.19
tzv
1.19
quefois
1.15
""""
1.15
roversial
1.15
dakkh
1.14
berühm
1.14
Activations Density 2.636%