INDEX
Explanations
words related to motivation and support
New Auto-Interp
Negative Logits
.Companion
-0.18
chers
-0.16
omp
-0.15
egan
-0.15
lay
-0.15
ocular
-0.15
ural
-0.15
ieren
-0.15
ilder
-0.15
ppy
-0.15
POSITIVE LOGITS
/prom
0.22
participation
0.19
/disable
0.19
/support
0.18
agement
0.18
Participation
0.16
ouver
0.15
us
0.15
umber
0.15
others
0.15
Activations Density 0.029%