INDEX
Explanations
discussions about sports events and team strategies
New Auto-Interp
Negative Logits
$$
-0.78
alde
-0.77
$$$$
-0.73
wart
-0.71
aunder
-0.66
utton
-0.66
adr
-0.66
arte
-0.65
oca
-0.64
illeg
-0.64
POSITIVE LOGITS
Explain
0.99
Interview
0.93
impressions
0.81
Desc
0.80
autobi
0.79
recollection
0.79
experien
0.75
misconceptions
0.73
clarify
0.71
qualitative
0.71
Activations Density 1.314%