INDEX
Explanations
terms related to a specific TV show, possibly "Saturday Night Live"
New Auto-Interp
Negative Logits
tons
-0.72
lessly
-0.64
Emirates
-0.64
Byrne
-0.64
Kali
-0.64
flies
-0.64
cens
-0.64
Emir
-0.62
Alonso
-0.61
Lama
-0.60
POSITIVE LOGITS
OW
1.01
SN
0.96
MP
0.91
ASH
0.88
ookie
0.88
ACK
0.88
igger
0.86
ipes
0.86
icket
0.86
azer
0.86
Activations Density 0.013%