INDEX
Explanations
question words in different languages
New Auto-Interp
Negative Logits
ইহাই
0.52
inbuilt
0.48
incul
0.45
exists
0.43
india
0.43
intrinsic
0.43
inherent
0.41
latest
0.41
agic
0.41
Это
0.40
POSITIVE LOGITS
யார்
0.58
nasıl
0.57
어떻게
0.57
谁
0.57
誰
0.57
хто
0.57
когда
0.54
ktoś
0.54
όταν
0.54
кто
0.54
Activations Density 0.004%