INDEX
Explanations
names of people or specific individuals
New Auto-Interp
Negative Logits
prus
-0.62
captcha
-0.57
cients
-0.53
anwhile
-0.53
unrestricted
-0.51
buzzing
-0.51
redistributed
-0.51
deficits
-0.50
collisions
-0.50
ancies
-0.50
POSITIVE LOGITS
*,
1.11
,
0.93
QC
0.91
Jr
0.90
!,
0.88
?,
0.88
Sr
0.86
,
0.85
,,
0.85
,[
0.83
Activations Density 0.253%