INDEX
Explanations
mentions of specific individuals or organizations, particularly in contexts related to healthcare or governance
New Auto-Interp
Negative Logits
resh
-0.18
855
-0.16
typ
-0.15
kı
-0.15
Linden
-0.15
arel
-0.14
ercul
-0.14
wood
-0.14
scar
-0.14
atra
-0.13
POSITIVE LOGITS
नà¤ķ
0.15
dames
0.15
ngo
0.14
arters
0.14
iminal
0.14
овж
0.14
_YUV
0.14
çĬ
0.14
endum
0.14
ieren
0.14
Activations Density 0.002%