INDEX
Explanations
phrases indicating contrast or exceptions in information
New Auto-Interp
Negative Logits
aarrggbb
-0.66
ivelany
-0.57
ardından
-0.56
InitVars
-0.56
rrggbb
-0.55
一種
-0.53
verschiedener
-0.53
]++;
-0.53
sekaligus
-0.53
зулта
-0.51
POSITIVE LOGITS
few
1.82
Few
1.66
few
1.58
Few
1.57
nobody
1.51
none
1.47
hardly
1.40
FEW
1.29
pocos
1.25
neither
1.23
Activations Density 0.660%