INDEX
Explanations
phrases related to inability or difficulty in accomplishing tasks
New Auto-Interp
Negative Logits
nis
-0.15
ASI
-0.15
Know
-0.14
etr
-0.14
Know
-0.14
eteria
-0.14
know
-0.14
curacy
-0.13
knows
-0.13
astes
-0.13
POSITIVE LOGITS
find
0.25
figure
0.23
find
0.22
finds
0.22
stomach
0.21
FIND
0.20
æī¾åΰ
0.19
figure
0.18
Finds
0.18
.find
0.18
Activations Density 0.130%