INDEX
Explanations
references to a specific abbreviation related to political positions
references to a specific sports venue or related events
New Auto-Interp
Negative Logits
olar
-0.70
paran
-0.69
âĸĪâĸĪ
-0.69
lins
-0.68
illas
-0.66
cruel
-0.66
Lovecraft
-0.63
Python
-0.62
Farn
-0.62
âĸĪâĸĪâĸĪâĸĪ
-0.61
POSITIVE LOGITS
TD
4.19
TD
2.45
TDs
2.43
td
1.47
touchdown
1.39
touchdowns
1.17
DT
1.14
TC
1.12
TS
1.10
MLA
1.10
Activations Density 0.008%