INDEX
Explanations
expressive language that emphasizes determination or insistence on a particular point
New Auto-Interp
Negative Logits
[]:
-0.51
istoitu
-0.50
(
-0.49
rang
-0.49
,
-0.48
</td>
-0.48
in
-0.47
<eos>
-0.47
had
-0.47
<
-0.46
POSITIVE LOGITS
pleaſure
1.09
Experiment
1.06
myſelf
1.04
itſelf
1.04
experiment
1.03
Experiment
0.99
expériment
0.99
purpoſe
0.98
Monfieur
0.97
ſtate
0.96
Activations Density 0.089%