INDEX
Explanations
phrases indicating contrast or exceptions
New Auto-Interp
Negative Logits
eton
-0.18
spath
-0.16
otti
-0.15
jam
-0.15
undle
-0.15
berger
-0.15
.Api
-0.14
ãģ¤ãģ¶
-0.14
onden
-0.14
sov
-0.14
POSITIVE LOGITS
auga
0.17
CHAIN
0.17
sz
0.15
ainers
0.14
Neh
0.14
cid
0.14
Chan
0.13
Skinner
0.13
ls
0.13
atan
0.13
Activations Density 0.117%