INDEX
Explanations
instances of discussions surrounding safety, emergencies, and hospital-related themes
New Auto-Interp
Negative Logits
åĩºäºĨ
-0.16
atti
-0.16
udge
-0.15
lev
-0.15
ollo
-0.15
trat
-0.15
erken
-0.14
adem
-0.14
ır
-0.14
åİ»äºĨ
-0.14
POSITIVE LOGITS
ählen
0.25
ichern
0.22
ieren
0.21
reiben
0.21
machen
0.21
icken
0.20
tun
0.20
uchen
0.20
bringen
0.19
ehen
0.18
Activations Density 0.027%