INDEX
Explanations
phrases indicating the basis or reasoning for decisions or statements
New Auto-Interp
Negative Logits
basics
-0.15
actionTypes
-0.15
oval
-0.14
973
-0.14
eyn
-0.14
neum
-0.14
over
-0.14
uts
-0.14
bás
-0.14
bravery
-0.13
POSITIVE LOGITS
upon
0.37
upon
0.30
Upon
0.28
Upon
0.25
off
0.20
solely
0.18
Sole
0.17
around
0.17
camp
0.16
loosely
0.16
Activations Density 0.026%