INDEX
Negative Logits
superficies
-0.08
raw
-0.08
sp
-0.07
remov
-0.07
chosen
-0.07
hindi
-0.07
false
-0.07
noise
-0.07
ంద్ర
-0.07
possibilities
-0.07
POSITIVE LOGITS
responsibly
0.13
courteous
0.11
respectfully
0.11
respeto
0.10
politely
0.10
सम्मान
0.10
respectful
0.10
altru
0.10
kindly
0.10
etiquette
0.10
Activations Density 0.004%