INDEX
Explanations
mentions of specific names or terms related to individuals or entities
New Auto-Interp
Negative Logits
sy
-0.23
sch
-0.20
su
-0.20
sell
-0.18
sj
-0.18
د
-0.17
sie
-0.17
s
-0.17
sen
-0.16
sil
-0.16
POSITIVE LOGITS
urved
0.22
enne
0.21
yyyy
0.21
eur
0.21
theon
0.21
den
0.20
oncé
0.20
eb
0.19
ton
0.19
ama
0.19
Activations Density 0.098%