INDEX
Explanations
instances of guidance or empowerment related to self-improvement, resources, or education
New Auto-Interp
Negative Logits
avec
-0.14
ká
-0.14
Sunder
-0.14
ysz
-0.14
shit
-0.14
.df
-0.13
ierce
-0.13
rosso
-0.13
overall
-0.13
borrow
-0.13
POSITIVE LOGITS
Wis
0.27
jud
0.26
advantage
0.25
differently
0.23
wisely
0.23
wis
0.22
wisdom
0.20
towards
0.20
smart
0.20
Advantage
0.20
Activations Density 0.092%