INDEX
Explanations
professions and their actions
New Auto-Interp
Negative Logits
kojim
0.33
which
0.31
alph
0.29
Likes
0.29
MyHomePage
0.29
얘가
0.29
lightness
0.29
NSMutable
0.28
忘れ
0.27
Maybe
0.27
POSITIVE LOGITS
extraordinaire
0.58
who
0.49
کرام
0.46
extraordin
0.46
którzy
0.45
দের
0.44
들에게
0.44
الذين
0.43
ktorí
0.43
specialising
0.41
Activations Density 0.095%