INDEX
Explanations
affirmative responses and degrees of confidence in dialogue
New Auto-Interp
Negative Logits
transfieras
-0.44
だよね
-0.40
Yeah
-0.37
kid
-0.37
freakin
-0.36
みんなの
-0.36
👭
-0.35
んだよね
-0.35
Gotta
-0.34
đứa
-0.34
POSITIVE LOGITS
sir
2.16
Sir
1.98
Sir
1.97
sir
1.70
SIR
1.55
SIR
1.51
Sirs
1.34
madam
1.30
Madam
1.21
senhor
1.16
Activations Density 0.318%