INDEX
Explanations
words and phrases indicating personal reflections or evaluations
New Auto-Interp
Negative Logits
Leer
-0.14
UC
-0.14
æīĺ
-0.14
Corner
-0.14
ross
-0.14
κά
-0.14
chain
-0.14
ĥģ
-0.13
convention
-0.13
heim
-0.13
POSITIVE LOGITS
fone
0.17
uso
0.16
'gc
0.15
/styles
0.15
edin
0.14
kins
0.14
ohl
0.14
ovÃŃ
0.14
_invoke
0.14
olver
0.14
Activations Density 0.027%