INDEX
Explanations
expressions related to admiration and respect for individuals
positive descriptions and roles
New Auto-Interp
Negative Logits
dise
-0.45
новниш
-0.44
хьтан
-0.41
pal
-0.41
zahl
-0.40
staging
-0.39
unsatisfactory
-0.39
pal
-0.38
Appel
-0.38
final
-0.37
POSITIVE LOGITS
rungsseite
0.55
dignité
0.48
compétence
0.48
humanidade
0.47
sagesse
0.47
ۜ
0.46
testify
0.45
GenerationType
0.45
Infór
0.44
Memiliki
0.44
Activations Density 0.024%