INDEX
Explanations
terms associated with classifications and categories
New Auto-Interp
Negative Logits
ir
-0.17
iii
-0.17
ii
-0.17
ib
-0.17
ip
-0.17
ig
-0.16
TRGL
-0.16
ia
-0.15
ic
-0.15
(ic
-0.15
POSITIVE LOGITS
I
0.35
IS
0.35
IC
0.35
İ
0.35
Ðĺ
0.35
IF
0.34
IO
0.33
I
0.33
Ãį
0.33
IA
0.33
Activations Density 0.120%