INDEX
Explanations
addressing masters and royalty
New Auto-Interp
Negative Logits
interesante
0.47
interesting
0.47
intriguing
0.46
arkadaşlar
0.44
sympathique
0.44
интерес
0.43
guys
0.43
좋아
0.43
લોકોને
0.42
интересных
0.42
POSITIVE LOGITS
humbly
0.98
humble
0.93
humild
0.72
Humble
0.71
陛下
0.68
servant
0.65
unworthy
0.65
Master
0.63
🙇
0.63
respectfully
0.62
Activations Density 0.008%