INDEX
Explanations
phrases indicating actions or responses that are conditional or dependent
New Auto-Interp
Negative Logits
zw
-0.17
è±
-0.14
ZW
-0.14
aring
-0.14
fol
-0.14
zt
-0.14
-0.14
amar
-0.14
anou
-0.14
prites
-0.14
POSITIVE LOGITS
extremes
0.28
lengths
0.27
bed
0.19
sleep
0.19
task
0.19
trouble
0.18
places
0.17
lenght
0.17
movies
0.17
jail
0.17
Activations Density 0.053%