INDEX
Explanations
references to specific entities or characters, particularly those associated with sports teams or political figures
references to a specific bear-related entity or theme
New Auto-Interp
Negative Logits
orious
-0.71
ooks
-0.70
ortion
-0.70
ulton
-0.69
icular
-0.69
icer
-0.68
soc
-0.67
icles
-0.67
resent
-0.65
arma
-0.65
POSITIVE LOGITS
Gry
0.85
opsis
0.83
Halls
0.80
Bear
0.74
Bear
0.72
brates
0.71
brate
0.71
vier
0.70
pta
0.69
plates
0.69
Activations Density 0.062%