INDEX
Explanations
words related to politics and economics
New Auto-Interp
Negative Logits
"],
-0.72
Solution
-0.68
aneous
-0.67
"]
-0.66
guiName
-0.64
Bott
-0.61
cause
-0.61
Spectre
-0.60
Passive
-0.59
Checking
-0.59
POSITIVE LOGITS
barely
0.85
ostensibly
0.81
admittedly
0.80
arguably
0.80
starved
0.80
notoriously
0.79
desperately
0.78
rir
0.77
ordinarily
0.76
hardly
0.75
Activations Density 0.271%