INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
is
1.09
ায়
1.01
지
1.01
কে
0.98
ায়
0.98
د
0.97
ל
0.93
دين
0.93
inah
0.93
ai
0.92
POSITIVE LOGITS
ور
1.12
ﻛ
0.93
médicas
0.84
к
0.82
РЕ
0.79
ਰ
0.79
pping
0.76
ra
0.75
अलावा
0.75
ෙන්ම
0.75
Activations Density 0.000%