INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ב
1.66
ט
1.61
0
1.57
ات
1.50
д
1.48
(
1.34
טן
1.31
ک
1.23
0
1.23
ों
1.22
POSITIVE LOGITS
n
1.20
ure
1.13
p
1.05
m
1.02
man
1.02
و
0.99
é
0.99
aj
0.97
nte
0.96
ný
0.95
Activations Density 0.000%