INDEX
Explanations
mentions of bears
references to bears in various contexts
New Auto-Interp
Negative Logits
anwhile
-0.85
lectic
-0.81
ij士
-0.74
ADRA
-0.73
icut
-0.72
inx
-0.72
enta
-0.71
arta
-0.71
ocrates
-0.71
yrim
-0.70
POSITIVE LOGITS
bear
1.00
cub
0.94
claws
0.91
paws
0.90
hugs
0.89
Bears
0.87
paw
0.86
Grizz
0.85
hug
0.83
beit
0.78
Activations Density 0.011%