INDEX
Explanations
phrases related to providing support or reinforcement
phrases related to support and backing in arguments or claims
New Auto-Interp
Negative Logits
Ĥİ
-0.67
killed
-0.66
ograms
-0.65
ilet
-0.64
antis
-0.63
resp
-0.63
soDeliveryDate
-0.63
orks
-0.62
eez
-0.62
ppers
-0.60
POSITIVE LOGITS
presumption
0.80
belief
0.79
assumption
0.75
conviction
0.73
ament
0.72
weights
0.71
Belief
0.70
disbelief
0.69
inference
0.69
lined
0.67
Activations Density 0.415%