INDEX
Explanations
names or mentions of individuals or organizations
specific letters, predominantly 'c' and 'd', within the text
New Auto-Interp
Negative Logits
å§«
-0.75
LEASE
-0.72
SUP
-0.68
é¾
-0.67
BILITY
-0.65
è»
-0.65
IGHTS
-0.65
bells
-0.65
ebted
-0.64
åħī
-0.64
POSITIVE LOGITS
uta
0.86
ama
0.84
ara
0.82
ula
0.81
oslav
0.79
ana
0.79
ka
0.76
ul
0.75
ija
0.75
Si
0.74
Activations Density 0.152%