INDEX
Explanations
phrases indicating contrast or exception
New Auto-Interp
Negative Logits
cken
-0.16
eteria
-0.15
erk
-0.15
â̦"↵↵
-0.15
heimer
-0.14
çĸij
-0.14
ëĿ½
-0.14
вÑĸлÑĮ
-0.14
εÏĦ
-0.14
olt
-0.14
POSITIVE LOGITS
being
0.19
knowing
0.18
it
0.16
otic
0.16
its
0.16
which
0.15
fact
0.15
Ged
0.15
;
0.15
edy
0.14
Activations Density 0.025%