INDEX
Explanations
website and email domains
New Auto-Interp
Negative Logits
Scheme
-0.83
Genie
-0.77
Hai
-0.74
Philippe
-0.74
Flav
-0.74
Majesty
-0.73
Bout
-0.72
Stall
-0.72
Kut
-0.72
Lung
-0.71
POSITIVE LOGITS
news
1.08
politics
1.03
ribune
0.90
podcast
0.86
reports
0.85
abc
0.85
legraph
0.83
photos
0.82
blogs
0.82
journal
0.82
Activations Density 0.063%