INDEX
Explanations
phrases indicating physical injuries or conditions
New Auto-Interp
Negative Logits
<+
-0.73
rir
-0.65
Reviewer
-0.63
amiya
-0.63
characterized
-0.63
.–
-0.62
arthy
-0.61
hops
-0.61
ulative
-0.59
ghai
-0.58
POSITIVE LOGITS
impunity
1.11
dignity
1.02
draw
0.97
drawn
0.97
stood
0.88
rosis
0.85
regard
0.85
regards
0.85
holding
0.85
respect
0.84
Activations Density 0.057%