INDEX
Explanations
words related to American football
New Auto-Interp
Negative Logits
İĭ
-0.86
pta
-0.82
etts
-0.76
emy
-0.73
ATURES
-0.72
ologies
-0.72
esome
-0.71
ysc
-0.70
yss
-0.70
ocamp
-0.69
POSITIVE LOGITS
ado
0.99
appreciated
0.96
simpler
0.93
nicer
0.82
NESS
0.79
easier
0.79
larger
0.79
smaller
0.78
else
0.78
cheaper
0.78
Activations Density 0.307%