INDEX
Explanations
instances of evidence or statements related to analysis and conclusions drawn from data or observations
New Auto-Interp
Negative Logits
pleaſure
-1.03
Monfieur
-0.92
houſe
-0.86
Jefus
-0.85
Cæsar
-0.83
theſe
-0.81
ſch
-0.79
ſtate
-0.79
ſhe
-0.77
greateſt
-0.77
POSITIVE LOGITS
was
0.81
did
0.79
didn
0.78
volna
0.75
ayer
0.75
damals
0.74
originally
0.73
previously
0.73
wasn
0.72
came
0.72
Activations Density 6.137%