INDEX
Explanations
phrases indicating capability or possibilities
New Auto-Interp
Negative Logits
purpoſe
-0.82
Majefty
-0.80
occaf
-0.80
pleaſure
-0.79
houſe
-0.73
ſtate
-0.70
iscus
-0.68
fuper
-0.68
QRST
-0.67
Chriftian
-0.67
POSITIVE LOGITS
cannot
0.79
Cannot
0.77
cannot
0.70
Cannot
0.68
Can
0.66
cant
0.65
can
0.64
CANNOT
0.64
Can
0.61
can
0.61
Activations Density 0.610%