INDEX
Explanations
references to popular television series and their characters
New Auto-Interp
Negative Logits
å¦ĸ
-0.17
thora
-0.16
ccoli
-0.15
Annunci
-0.14
Macron
-0.14
ÑģÑĤа
-0.14
ken
-0.14
maç
-0.14
Penalty
-0.14
ÑĢоÑĦ
-0.14
POSITIVE LOGITS
meth
0.31
Walt
0.30
Breaking
0.28
Walter
0.28
meth
0.27
Breaking
0.26
Albuquerque
0.24
Jesse
0.24
Meth
0.23
Gale
0.23
Activations Density 0.003%