INDEX
Explanations
references to bears and bear-related terms
New Auto-Interp
Negative Logits
lrrrr
-0.77
Pelop
-0.69
Rau
-0.67
Schur
-0.64
UTERS
-0.64
Carthag
-0.63
Vlas
-0.63
ghed
-0.62
Rau
-0.62
Lapp
-0.61
POSITIVE LOGITS
Bears
1.10
Bear
1.04
BEAR
1.01
bears
0.98
Bear
0.98
bear
0.94
bearing
0.91
Bears
0.90
Bearing
0.89
BEAR
0.87
Activations Density 0.419%