INDEX
Explanations
explaining, describing, or listing actions
New Auto-Interp
Negative Logits
সহজেই
0.67
خیلی
0.65
খুব
0.64
foarte
0.64
extraordinaire
0.63
очень
0.61
بالکل
0.60
इजीली
0.58
muito
0.57
вполне
0.56
POSITIVE LOGITS
ቶችን
0.47
протяжении
0.45
每一
0.43
Muhammadu
0.39
ධ
0.39
interdependent
0.38
REMOVE
0.38
awọn
0.38
stoichiometry
0.38
TERMIN
0.37
Activations Density 0.188%