INDEX
Explanations
understanding and consideration
understanding the
New Auto-Interp
Negative Logits
၊
0.96
،
0.90
、
0.82
፣
0.78
ül
0.75
ny
0.74
nel
0.72
ng
0.71
ltry
0.71
,“
0.71
POSITIVE LOGITS
ר
1.31
ور
1.12
:
1.02
મ
1.02
ي
1.00
ב
0.98
ות
0.97
n
0.97
ה
0.97
ر
0.95
Activations Density 0.847%