INDEX
Explanations
social, political, or professional roles
New Auto-Interp
Negative Logits
:
1.45
0
1.23
abstract
1.20
identical
1.20
,
1.18
important
1.14
advantage
1.09
tha
1.04
kudos
1.03
)
1.02
POSITIVE LOGITS
𝙖
1.29
彠
1.29
𝐞
1.26
clínica
1.20
躹
1.16
óloga
1.15
焲
1.15
𝑳
1.15
agna
1.13
presidente
1.13
Activations Density 0.400%