INDEX
Explanations
Roman numerals
references to specific labels or symbols associated with various entities or categories
New Auto-Interp
Negative Logits
umerable
-0.81
kson
-0.78
keepers
-0.75
assetsadobe
-0.71
behavi
-0.70
unci
-0.70
Pelosi
-0.69
¯¯¯¯¯¯¯¯
-0.69
aiman
-0.69
cooperating
-0.69
POSITIVE LOGITS
VII
1.22
XX
1.02
III
0.97
II
0.90
Sax
0.88
YY
0.87
XX
0.85
VI
0.83
xx
0.80
XY
0.80
Activations Density 0.019%