INDEX
Explanations
references to sports teams and their interaction with fans
New Auto-Interp
Negative Logits
”?
-0.15
tero
-0.15
‘
-0.13
”—
-0.13
ÑĢоп
-0.13
оÑĩно
-0.13
поки
-0.13
”
-0.13
[](
-0.13
‘
-0.13
POSITIVE LOGITS
basically
0.28
I
0.25
but
0.25
really
0.25
-
0.24
because
0.24
obviously
0.24
whereas
0.23
yeah
0.23
.↵
0.22
Activations Density 0.093%