INDEX
Explanations
corruption charges and scandals
New Auto-Interp
Negative Logits
م
1.88
ן
1.50
ם
1.32
ش
1.31
ج
1.27
ம்
1.26
на
1.24
й
1.21
ння
1.16
it
1.15
POSITIVE LOGITS
t
1.67
3
1.47
ti
1.33
r
1.33
li
1.27
h
1.23
2
1.20
v
1.19
ty
1.08
token
1.03
Activations Density 0.001%