INDEX
Explanations
references to the word "bra."
New Auto-Interp
Negative Logits
боÑĤ
-0.16
retched
-0.16
便
-0.15
uppy
-0.14
ogan
-0.14
yro
-0.14
lixir
-0.14
åı¬
-0.14
astes
-0.14
309
-0.14
POSITIVE LOGITS
ided
0.25
Bra
0.22
hma
0.21
bra
0.21
odcast
0.19
intree
0.19
BRA
0.19
Brah
0.18
bra
0.18
unsch
0.18
Activations Density 0.008%