INDEX
Explanations
phrases related to providing reasons or justifications
phrases that reference criteria or reasons for judgments and decisions
New Auto-Interp
Negative Logits
ascus
-0.73
aspers
-0.67
rys
-0.66
stall
-0.65
knit
-0.65
ortium
-0.64
nova
-0.64
byter
-0.63
nery
-0.60
jer
-0.60
POSITIVE LOGITS
assumption
0.83
occasions
0.75
either
0.72
premise
0.69
basis
0.66
behalf
0.65
footing
0.65
grounds
0.64
whim
0.64
GUI
0.64
Activations Density 0.039%