INDEX
Explanations
words and concepts related to capability and potential
New Auto-Interp
Negative Logits
ing
-0.21
ed
-0.20
el
-0.18
ese
-0.17
ãĤ¥
-0.16
olt
-0.16
arily
-0.16
edb
-0.16
egal
-0.16
ø
-0.15
POSITIVE LOGITS
-bodied
0.21
heid
0.18
ummings
0.15
lisi
0.15
ilty
0.15
Jar
0.15
_OVERRIDE
0.14
raci
0.14
keit
0.14
ë¡ľ
0.14
Activations Density 0.161%