INDEX
Explanations
instances where someone is speaking out, expressing opinions or experiences
instances of speaking out or expressing opinions on various issues
New Auto-Interp
Negative Logits
Carbuncle
-0.83
atomic
-0.73
Jackets
-0.70
ipment
-0.67
Landing
-0.67
Gs
-0.63
jiang
-0.63
Rocket
-0.62
refres
-0.61
plet
-0.61
POSITIVE LOGITS
loudly
1.12
against
1.11
louder
1.11
forcefully
1.10
loud
0.99
anonymously
0.98
publicly
0.94
against
0.91
boldly
0.88
angrily
0.86
Activations Density 0.065%