INDEX
Explanations
quantitative descriptions of characteristics or events
New Auto-Interp
Negative Logits
agate
-0.77
oran
-0.73
alis
-0.73
ourses
-0.67
City
-0.67
Dynamics
-0.66
oris
-0.66
City
-0.65
messenger
-0.65
endi
-0.63
POSITIVE LOGITS
identical
0.90
unanimous
0.88
impossible
0.81
nonexistent
0.79
thood
0.78
limitless
0.78
unanimously
0.77
zero
0.77
certainly
0.76
indistinguishable
0.75
Activations Density 0.904%