INDEX
Explanations
phrases that indicate progress or advancement through different stages or levels
New Auto-Interp
Negative Logits
abus
-0.18
ayne
-0.17
rowse
-0.17
_TA
-0.15
eza
-0.15
EDA
-0.15
اد
-0.14
714
-0.14
Morr
-0.13
Parcel
-0.13
POSITIVE LOGITS
Tro
0.17
tro
0.16
jer
0.16
Tro
0.15
hl
0.15
enia
0.15
onia
0.15
hil
0.15
aray
0.15
iant
0.14
Activations Density 0.016%