INDEX
Explanations
names of individuals, particularly in the context of sports or entertainment
New Auto-Interp
Negative Logits
Berk
-0.72
Sports
-0.71
Abrams
-0.68
rug
-0.65
photos
-0.64
Ney
-0.64
sports
-0.63
RT
-0.62
Baseball
-0.62
Rost
-0.62
POSITIVE LOGITS
faith
0.81
unity
0.73
miss
0.72
ueless
0.70
incompet
0.69
utor
0.67
everyone
0.66
��
0.65
population
0.65
Issue
0.65
Activations Density 0.097%