INDEX
Explanations
watching someone or something
New Auto-Interp
Negative Logits
পুরের
1.34
पीड़न
1.32
faptul
1.31
क
1.26
基づ
1.21
ج
1.21
ിയ
1.17
یز
1.17
ficando
1.15
пиа
1.13
POSITIVE LOGITS
helplessly
1.72
tower
1.69
ה
1.58
া
1.54
a
1.50
tive
1.41
ا
1.38
xét
1.38
jší
1.36
нга
1.29
Activations Density 0.043%