INDEX
Explanations
terms related to authority and governance
New Auto-Interp
Negative Logits
entin
-0.16
ød
-0.14
oki
-0.14
lish
-0.14
fancy
-0.14
ек
-0.14
ango
-0.14
.virtual
-0.13
uf
-0.13
gaze
-0.13
POSITIVE LOGITS
Larger
0.26
larger
0.24
largest
0.22
large
0.21
large
0.21
bigger
0.20
LARGE
0.20
Large
0.20
-largest
0.19
smaller
0.19
Activations Density 0.008%