INDEX
Explanations
words related to physical assault or violence
references to the word "mug" in various contexts
New Auto-Interp
Negative Logits
BCE
-0.77
Phase
-0.76
RC
-0.70
phas
-0.69
COM
-0.68
EA
-0.68
CE
-0.68
urance
-0.67
loop
-0.67
׾
-0.66
POSITIVE LOGITS
mug
4.13
Mug
2.26
jug
1.10
fug
1.10
robber
1.05
robbed
0.99
thug
0.98
bust
0.91
wig
0.90
burg
0.90
Activations Density 0.017%