INDEX
Explanations
numbers that correspond to specific information or data points
New Auto-Interp
Negative Logits
',
-0.67
,"
-0.67
comprom
-0.66
",
-0.63
milo
-0.62
cooperative
-0.62
naissance
-0.60
superst
-0.60
positively
-0.58
cho
-0.57
POSITIVE LOGITS
][
1.82
]
1.71
]"
1.39
].
1.35
])
1.33
]).
1.32
],[
1.31
]'
1.27
]:
1.21
]),
1.21
Activations Density 0.035%