INDEX
Explanations
instances of the phrase "don't be."
New Auto-Interp
Negative Logits
aybe
-0.16
bage
-0.15
ghan
-0.14
ÑĢава
-0.14
lest
-0.14
abei
-0.14
ADI
-0.14
andi
-0.14
isses
-0.14
stoup
-0.14
POSITIVE LOGITS
worry
0.18
/do
0.17
Go
0.16
Go
0.16
oice
0.16
forget
0.16
Allen
0.16
Allen
0.15
Forget
0.15
go
0.15
Activations Density 0.033%