INDEX
Explanations
phrases related to leaving and being alone
New Auto-Interp
Negative Logits
kel
-0.18
lav
-0.16
echa
-0.15
GU
-0.15
akh
-0.14
aÄį
-0.14
Missing
-0.14
nar
-0.14
igy
-0.14
Lease
-0.14
POSITIVE LOGITS
alone
0.42
alone
0.35
Alone
0.34
-alone
0.29
aside
0.23
untouched
0.23
intact
0.22
à¹Ħว
0.20
unchanged
0.19
aside
0.17
Activations Density 0.051%