INDEX
Explanations
references to potential choking hazards
New Auto-Interp
Negative Logits
istrovstvÃŃ
-0.17
ril
-0.17
tl
-0.14
leurs
-0.14
rij
-0.14
زاÙĨ
-0.14
sadd
-0.14
æ±Ĺ
-0.14
μά
-0.13
dein
-0.13
POSITIVE LOGITS
swallowing
0.40
gag
0.39
choking
0.38
swallowed
0.35
throat
0.35
swallow
0.34
choked
0.31
choke
0.31
throat
0.30
å
0.29
Activations Density 0.074%