INDEX
Explanations
mentions of individuals by their names or accounts on social media platforms
references to popular television shows and their characters
New Auto-Interp
Negative Logits
Okinawa
-0.99
Sapp
-0.97
Buddhism
-0.93
Zen
-0.93
Buddhist
-0.91
Dresden
-0.86
Sega
-0.86
Psy
-0.86
Helsinki
-0.83
Japan
-0.83
POSITIVE LOGITS
Oliver
2.26
Ol
1.79
Arrow
1.71
Stewart
1.53
Laurel
1.47
Stew
1.43
Canary
1.38
Colbert
1.26
Olsen
1.17
Stephen
1.17
Activations Density 0.209%