INDEX
Explanations
references to physical violence or abusive behavior
New Auto-Interp
Negative Logits
tÃŃch
-0.16
abo
-0.16
tuk
-0.15
ira
-0.15
íĦ°
-0.15
tuyá»ĥn
-0.14
521
-0.14
rijk
-0.14
Stick
-0.14
pak
-0.13
POSITIVE LOGITS
pinned
0.18
struggling
0.17
aspers
0.16
struggles
0.16
struggle
0.16
submission
0.16
hold
0.16
suff
0.15
Dân
0.15
Hold
0.15
Activations Density 0.058%