INDEX
Explanations
established or common ideas
New Auto-Interp
Negative Logits
䧏
0.39
Dz
0.38
Primarily
0.38
叵
0.37
異なる
0.37
違う
0.37
इलेक्ट्रिक
0.37
ANZ
0.37
نہ
0.36
मंत्र
0.36
POSITIVE LOGITS
commonplace
0.90
common
0.70
comum
0.69
常見
0.67
常见的
0.66
comuns
0.64
常见
0.63
আগেও
0.63
routinely
0.61
Already
0.61
Activations Density 0.659%