INDEX
Explanations
phrases and concepts related to consistency or change over time
New Auto-Interp
Negative Logits
SplitOptions
-0.16
oÄį
-0.14
illet
-0.14
ãĥ³ãĥĸ
-0.14
enso
-0.14
adh
-0.14
erland
-0.13
<main
-0.13
:both
-0.13
003
-0.13
POSITIVE LOGITS
same
0.49
same
0.43
åIJĮ
0.43
Same
0.40
identical
0.40
Same
0.38
_same
0.36
mismo
0.36
åIJĮãģĺ
0.36
缸åIJĮ
0.35
Activations Density 0.162%