INDEX
Explanations
numerical representations or classifications
New Auto-Interp
Negative Logits
v
-0.23
l
-0.23
auf
-0.23
lut
-0.21
ré
-0.21
r
-0.21
aqu
-0.20
o
-0.20
i
-0.20
lk
-0.19
POSITIVE LOGITS
obra
0.20
ensored
0.20
usp
0.19
ursive
0.19
ove
0.19
actus
0.19
rosso
0.19
zech
0.19
ypress
0.19
oven
0.19
Activations Density 0.019%