INDEX
Explanations
adjectives indicating importance or necessity
terms indicating importance or necessity
New Auto-Interp
Negative Logits
arij
-0.71
vomit
-0.69
vom
-0.69
renheit
-0.68
flat
-0.66
sweats
-0.64
elf
-0.61
çİĭ
-0.59
uclear
-0.59
qus
-0.58
POSITIVE LOGITS
determining
0.79
deterrent
0.78
adjunct
0.78
safeguard
0.77
because
0.73
motivating
0.72
for
0.71
distinguishing
0.71
integral
0.70
limiting
0.70
Activations Density 0.102%