INDEX
Explanations
references to a specific individual named Xi
New Auto-Interp
Negative Logits
(
-0.45
ffine
-0.43
bul
-0.43
</em>
-0.41
Duf
-0.39
hari
-0.38
stylers
-0.38
</i>
-0.38
gur
-0.38
delic
-0.36
POSITIVE LOGITS
Xi
2.03
Xi
1.73
Jinping
1.13
xi
0.91
习近平
0.88
XI
0.77
houſe
0.76
plufieurs
0.75
ainfi
0.75
XI
0.73
Activations Density 0.004%