INDEX
Explanations
references to human beings and aspects of human existence
New Auto-Interp
Negative Logits
strecke
-0.40
boste
-0.36
respective
-0.36
kệ
-0.35
Besten
-0.35
treo
-0.35
spill
-0.34
parfüm
-0.34
Höhe
-0.34
specifico
-0.34
POSITIVE LOGITS
Human
1.41
human
1.36
Human
1.34
human
1.27
HUMAN
1.23
HUMAN
1.20
Humans
1.05
Humans
1.05
humans
1.02
humans
1.01
Activations Density 0.142%