INDEX
Explanations
conveying core statements or rules
New Auto-Interp
Negative Logits
:「
0.46
不想
0.45
はじめ
0.44
,「
0.43
!”
0.41
Anything
0.41
afterwards
0.41
这也是
0.41
nahin
0.41
ఇదే
0.41
POSITIVE LOGITS
essentially
0.74
basically
0.71
básicamente
0.62
basically
0.62
simply
0.60
basicamente
0.58
principally
0.57
Basically
0.56
essentially
0.56
simplesmente
0.55
Activations Density 0.014%