INDEX
Explanations
phrases indicating attempts or efforts to accomplish something
New Auto-Interp
Negative Logits
myſelf
-0.80
itſelf
-0.79
ValueGeneration
-0.74
pleaſure
-0.74
ſtate
-0.73
Futura
-0.71
Jefus
-0.70
himſelf
-0.68
occafion
-0.65
cauſe
-0.65
POSITIVE LOGITS
attempt
1.39
attempts
1.31
trying
1.29
versucht
1.23
tries
1.22
Trying
1.20
versuchen
1.18
tentando
1.17
Trying
1.16
trying
1.15
Activations Density 0.120%