INDEX
Explanations
phrases indicating lack of understanding or confusion
expressions of confusion or lack of understanding
New Auto-Interp
Negative Logits
ridge
-0.72
quartered
-0.62
orney
-0.61
Britann
-0.60
envelope
-0.60
Designs
-0.60
velt
-0.59
ournal
-0.59
è¦ļéĨĴ
-0.59
afort
-0.58
POSITIVE LOGITS
rid
0.98
anywhere
0.90
TING
0.78
ANY
0.76
anymore
0.76
bog
0.74
bothered
0.72
reimb
0.72
attr
0.72
Ĺ
0.72
Activations Density 0.064%