INDEX
Explanations
instances of the word "ban" in various contexts
New Auto-Interp
Negative Logits
<unused68>
-0.98
<unused52>
-0.97
<unused79>
-0.97
betweenstory
-0.97
<unused14>
-0.97
[@BOS@]
-0.97
<unused41>
-0.96
<unused8>
-0.96
<unused3>
-0.96
<unused16>
-0.96
POSITIVE LOGITS
arm
0.68
ban
0.66
AS
0.58
arm
0.58
ban
0.56
AS
0.55
e
0.54
Ban
0.53
0.51
Ban
0.50
Activations Density 0.492%