INDEX
Explanations
instances of the word "Us" with varying importance, potentially based on context
the term "Us" in various contexts
New Auto-Interp
Negative Logits
served
-0.82
*/(
-0.68
BART
-0.62
paralle
-0.62
chaired
-0.61
committee
-0.61
secrecy
-0.59
seed
-0.59
jail
-0.59
1945
-0.59
POSITIVE LOGITS
agi
1.00
Us
0.96
AGES
0.90
ween
0.86
awar
0.86
urers
0.84
ages
0.84
ubi
0.82
ability
0.82
selves
0.80
Activations Density 0.004%