INDEX
Explanations
philanthropic endeavors and advice
New Auto-Interp
Negative Logits
s
1.09
on
1.06
ä
1.01
as
0.97
ro
0.97
2
0.91
r
0.88
g
0.86
os
0.85
c
0.80
POSITIVE LOGITS
ק
0.89
philanthropic
0.84
娯
0.84
artístico
0.80
przyję
0.79
arrêté
0.79
peruse
0.77
impose
0.77
apoyo
0.76
ও
0.75
Activations Density 0.004%