INDEX
Explanations
phrases that suggest cause and effect relationships
New Auto-Interp
Negative Logits
Majefty
-0.87
pleaſure
-0.86
myſelf
-0.85
Monfieur
-0.85
purpoſe
-0.84
Theſe
-0.82
Efq
-0.81
Shakspeare
-0.80
Anſ
-0.78
Cæsar
-0.77
POSITIVE LOGITS
so
0.67
thus
0.60
因此
0.57
esez
0.57
editForm
0.56
Thus
0.56
therefore
0.55
hence
0.55
So
0.55
somit
0.54
Activations Density 0.291%