INDEX
Explanations
phrases indicating intentions or actions aimed at achieving specific goals
New Auto-Interp
Negative Logits
ſtate
-0.89
ftate
-0.89
Majefty
-0.83
Efq
-0.80
fubject
-0.79
houſe
-0.79
pleaſure
-0.78
whoſe
-0.77
chofe
-0.76
poffe
-0.73
POSITIVE LOGITS
Để
0.93
чтобы
0.88
为了
0.86
כדי
0.85
afin
0.85
Cyfeiriadau
0.82
Để
0.82
ůli
0.81
Чтобы
0.80
為了
0.79
Activations Density 0.096%