INDEX
Explanations
words related to encouragement and support
New Auto-Interp
Negative Logits
chers
-0.18
egan
-0.17
ural
-0.15
rans
-0.15
omp
-0.15
ook
-0.14
ãģ°
-0.14
mesi
-0.14
inux
-0.14
iculty
-0.14
POSITIVE LOGITS
/prom
0.22
participation
0.21
/support
0.20
/disable
0.18
agement
0.18
Participation
0.16
others
0.16
ouver
0.15
ongs
0.15
preneur
0.15
Activations Density 0.024%