INDEX
Explanations
phrases indicating ongoing problems or unresolved issues
New Auto-Interp
Negative Logits
Rough
-0.17
xef
-0.16
frau
-0.15
vere
-0.14
urette
-0.14
ccb
-0.14
riba
-0.14
eru
-0.14
erner
-0.13
leh
-0.13
POSITIVE LOGITS
still
0.25
Still
0.21
still
0.19
è¿ĺæĺ¯
0.19
Still
0.18
STILL
0.17
ainda
0.17
olland
0.16
-ie
0.15
.gg
0.15
Activations Density 0.249%