INDEX
Explanations
phrases indicating discussions about topics or concerns
New Auto-Interp
Negative Logits
alve
-0.80
ſelf
-0.80
Reſ
-0.75
houſe
-0.72
poffe
-0.70
Conſ
-0.70
russes
-0.69
himſelf
-0.69
Diſ
-0.68
ſtate
-0.68
POSITIVE LOGITS
about
1.88
ABOUT
1.74
ABOUT
1.59
About
1.59
About
1.49
about
1.45
abt
1.45
bout
1.43
Bout
1.29
Bout
1.21
Activations Density 0.134%