INDEX
Explanations
female, character descriptions
New Auto-Interp
Negative Logits
ה
1.57
מ
1.54
ن
1.51
ない
1.38
א
1.36
м
1.34
ları
1.33
padă
1.32
ي
1.31
n
1.30
POSITIVE LOGITS
elling
1.20
1.19
het
1.05
)
1.04
for
0.96
male
0.94
ess
0.94
ized
0.94
ash
0.93
age
0.91
Activations Density 0.023%