INDEX
Explanations
phrases related to philosophical and political discussions
New Auto-Interp
Negative Logits
undet
-0.76
neighb
-0.74
censored
-0.74
coral
-0.74
lifes
-0.73
spitting
-0.73
que
-0.73
ability
-0.73
bid
-0.72
bloc
-0.72
POSITIVE LOGITS
Advertisement
1.86
Advertisements
1.62
Anyway
1.61
Related
1.60
However
1.60
What
1.59
Because
1.58
But
1.58
Unfortunately
1.58
Nevertheless
1.57
Activations Density 1.464%