INDEX
Explanations
greetings followed by names or titles
New Auto-Interp
Negative Logits
dvara
0.59
bidirectional
0.54
interrelated
0.52
hypothalamic
0.52
nontrivial
0.51
akati
0.51
vatth
0.51
idiosyncratic
0.51
ሌሎች
0.51
搞
0.51
POSITIVE LOGITS
dear
1.63
sir
1.54
Dear
1.34
dearest
1.34
dear
1.34
sir
1.29
monsieur
1.27
جناب
1.26
Dear
1.25
Sir
1.23
Activations Density 0.377%