INDEX
Explanations
phrases indicating sports seasons or games
New Auto-Interp
Negative Logits
797
-0.15
adel
-0.14
utor
-0.14
Formatter
-0.14
æīį
-0.14
оÑĩеÑĢед
-0.14
ipc
-0.14
æĤ
-0.14
पत
-0.14
lech
-0.13
POSITIVE LOGITS
jas
0.16
esson
0.16
usch
0.16
ugh
0.15
wards
0.15
byss
0.15
bd
0.14
wei
0.14
ettel
0.14
itesse
0.14
Activations Density 0.024%