INDEX
Explanations
mentions of social issues and controversies, particularly around religion and politics
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.62
ORPG
-0.59
olutions
-0.57
aukee
-0.56
Balanced
-0.55
Flavoring
-0.55
Defeat
-0.55
gat
-0.54
youtu
-0.52
ãĥ¯ãĥ³
-0.52
POSITIVE LOGITS
alas
1.11
unsurprisingly
1.03
uh
0.98
indeed
0.85
albeit
0.85
unfortunately
0.84
um
0.84
frankly
0.84
admittedly
0.84
moreover
0.82
Activations Density 0.066%