INDEX
Explanations
phrases related to upper and lower levels or classes, and their associated qualities
New Auto-Interp
Negative Logits
sse
-0.16
shed
-0.15
efe
-0.15
дал
-0.15
eland
-0.14
Svens
-0.14
eph
-0.14
emu
-0.13
rgb
-0.13
eur
-0.13
POSITIVE LOGITS
most
0.44
MOST
0.27
-middle
0.26
reaches
0.25
-most
0.25
cased
0.24
/l
0.24
class
0.24
ech
0.23
archy
0.23
Activations Density 0.031%