INDEX
Explanations
questions or statements related to observing, predicting, or analyzing outcomes
inquiries that express curiosity or uncertainty about outcomes
New Auto-Interp
Negative Logits
assisted
-0.74
"}],"
-0.70
igned
-0.64
anked
-0.64
ul
-0.63
stated
-0.61
forth
-0.61
Saharan
-0.60
UTH
-0.59
åij
-0.59
POSITIVE LOGITS
reaction
0.69
trends
0.66
reactions
0.63
mismatch
0.63
attrition
0.62
iasm
0.61
vine
0.61
extent
0.61
feasibility
0.60
neigh
0.60
Activations Density 0.092%