INDEX
Explanations
phrases related to deducing or relying on information
references to decisions or conclusions that are based on specific criteria or data
New Auto-Interp
Negative Logits
anamo
-0.78
pload
-0.75
scribe
-0.74
rolet
-0.71
erers
-0.71
anke
-0.70
apo
-0.70
asking
-0.70
robe
-0.70
joy
-0.70
POSITIVE LOGITS
assumptions
1.08
criteria
1.04
assumption
1.04
observations
1.03
principles
0.99
evaluations
0.96
feedback
0.95
whims
0.95
premise
0.91
intuition
0.90
Activations Density 0.386%