INDEX
Explanations
passive actions or states described using specific verbs
phrases indicating obligations, tendencies, and future actions related to individuals
New Auto-Interp
Negative Logits
polit
-0.74
politics
-0.69
entails
-0.64
transpired
-0.63
heals
-0.58
),"
-0.57
cler
-0.56
aman
-0.55
/+
-0.54
geop
-0.53
POSITIVE LOGITS
ŃĶ
0.80
nevertheless
0.74
attest
0.73
Ĥİ
0.69
nonetheless
0.69
likewise
0.67
themselves
0.65
ledged
0.61
hap
0.60
ushima
0.60
Activations Density 0.532%