INDEX
Explanations
words related to sports and athletes
New Auto-Interp
Negative Logits
olves
-0.79
adh
-0.72
outputs
-0.67
¥µ
-0.63
versive
-0.61
animate
-0.59
nih
-0.59
ole
-0.59
appropriate
-0.59
feed
-0.58
POSITIVE LOGITS
meanwhile
1.24
however
1.13
flanked
1.07
enegger
0.99
who
0.98
pictured
0.96
nicknamed
0.94
whose
0.94
whose
0.92
who
0.92
Activations Density 0.122%