INDEX
Explanations
various forms and references to meat and animal products
New Auto-Interp
Negative Logits
avaÅŁ
-0.15
ète
-0.15
iri
-0.14
slash
-0.13
ingo
-0.13
ashed
-0.13
евиÑĩ
-0.13
uther
-0.13
indx
-0.13
_nth
-0.13
POSITIVE LOGITS
STALL
0.15
ohen
0.15
åĨµ
0.15
upal
0.15
LLL
0.14
gra
0.14
strup
0.14
642
0.14
Blasio
0.14
опиÑģ
0.14
Activations Density 0.011%