INDEX
Explanations
words related to autonomy or self-direction
New Auto-Interp
Negative Logits
бина
-0.16
mani
-0.14
atar
-0.14
unga
-0.14
ovah
-0.14
ibraltar
-0.14
ĺìĿ´
-0.13
quip
-0.13
antino
-0.13
ảng
-0.13
POSITIVE LOGITS
iful
0.18
лив
0.15
agi
0.15
sled
0.15
Liberties
0.14
baugh
0.14
inen
0.14
apest
0.14
sen
0.13
.poi
0.13
Activations Density 0.007%