INDEX
Explanations
words related to hierarchical status or positions
New Auto-Interp
Negative Logits
ogany
-0.17
ackage
-0.16
Nova
-0.16
asta
-0.15
vertisement
-0.14
bage
-0.14
shake
-0.14
leÅŁ
-0.14
oftware
-0.14
endale
-0.14
POSITIVE LOGITS
point
0.17
stance
0.17
yonel
0.17
ality
0.17
positions
0.16
embro
0.15
hips
0.15
oles
0.15
=pos
0.15
OLE
0.14
Activations Density 0.052%