INDEX
Explanations
the numeral representations of experimental phases or categories
Roman numeral II
New Auto-Interp
Negative Logits
chieht
-0.58
jadx
-0.58
ieteur
-0.50
zoude
-0.50
promessa
-0.49
범
-0.48
gömlek
-0.47
tekem
-0.47
aihe
-0.46
Grom
-0.46
POSITIVE LOGITS
II
1.86
II
1.56
III
1.27
III
1.10
ii
1.10
IIII
1.03
Ii
1.02
ii
0.98
IIA
0.96
Ⅱ
0.94
Activations Density 0.027%