INDEX
Explanations
mentions of pairs or groups of items or concepts
New Auto-Interp
Negative Logits
Various
-0.45
various
-0.43
all
-0.42
各种
-0.39
Various
-0.38
pelbagai
-0.38
berbagai
-0.37
variés
-0.37
various
-0.36
tất
-0.36
POSITIVE LOGITS
two
0.78
two
0.77
deux
0.77
兩種
0.76
兩個
0.73
Two
0.71
ujednoznacz
0.71
zwei
0.71
两个
0.71
dvě
0.68
Activations Density 0.898%