INDEX
Explanations
phrases indicating uncertainty or lack of knowledge
New Auto-Interp
Negative Logits
HRC
-0.66
Indoch
-0.64
rsiniz
-0.63
ergo
-0.62
Racine
-0.61
ARA
-0.59
lück
-0.58
Myra
-0.58
nont
-0.57
nonetheless
-0.56
POSITIVE LOGITS
siquiera
0.87
barely
0.86
even
0.81
InputBorder
0.79
ogóle
0.71
发表于
0.71
EVEN
0.67
вообще
0.64
nemmeno
0.61
حتی
0.61
Activations Density 0.102%