INDEX
Explanations
references to the word "us" and variations of it
New Auto-Interp
Negative Logits
Monfieur
-0.95
Houſe
-0.87
houſe
-0.85
pleaſure
-0.83
Charlemagne
-0.81
Efq
-0.81
purpoſe
-0.80
Karak
-0.78
Chy
-0.77
ſever
-0.77
POSITIVE LOGITS
us
1.51
Us
1.25
Us
1.08
We
1.04
we
0.92
We
0.85
WE
0.83
them
0.82
me
0.80
us
0.80
Activations Density 0.024%