INDEX
Explanations
words related to critique and analysis of political and cultural topics
New Auto-Interp
Negative Logits
Bees
-0.79
tears
-0.74
bells
-0.67
oranges
-0.67
bows
-0.67
highlights
-0.66
Secrets
-0.65
Vaughn
-0.64
Inn
-0.64
Lynd
-0.64
POSITIVE LOGITS
albeit
1.51
non
1.30
yet
1.26
sic
1.25
meaning
1.16
possibly
1.10
?)
1.06
rather
1.04
adult
1.04
atomic
1.03
Activations Density 0.066%