INDEX
Explanations
phrases that express contrast or opposition
New Auto-Interp
Negative Logits
ancia
-0.07
.must
-0.07
zos
-0.07
sice
-0.07
ë¿IJ
-0.07
bak
-0.07
UGE
-0.07
NotNull
-0.06
ikke
-0.06
uang
-0.06
POSITIVE LOGITS
actual
0.11
actually
0.09
actual
0.08
proper
0.08
Actually
0.08
Actual
0.08
Actually
0.08
羣æŃ£
0.07
real
0.07
(actual
0.07
Activations Density 0.039%