INDEX
Explanations
phrases that express uncertainty or conditionality
New Auto-Interp
Negative Logits
Efq
-0.83
tvguidetime
-0.81
Shakspeare
-0.80
sandero
-0.79
myſelf
-0.78
itſelf
-0.75
Cæsar
-0.74
་་
-0.74
Meiji
-0.74
Majefty
-0.74
POSITIVE LOGITS
it
1.15
the
1.05
there
1.03
we
0.85
nobody
0.81
this
0.80
neither
0.74
they
0.71
I
0.69
everyone
0.68
Activations Density 1.692%