INDEX
Explanations
expressions of confusion or difficulty in understanding concepts
New Auto-Interp
Negative Logits
поÑģÑĤоÑıнно
-0.24
always
-0.21
constantly
-0.21
siempre
-0.20
sempre
-0.20
вÑģегда
-0.19
toujours
-0.19
always
-0.19
now
-0.18
ä¸Ģ缴
-0.18
POSITIVE LOGITS
simply
0.24
downright
0.22
even
0.21
sometimes
0.19
Sometimes
0.19
çĶļèĩ³
0.19
even
0.18
Simply
0.18
depending
0.18
sogar
0.18
Activations Density 0.236%