INDEX
Explanations
expressing admiration and alignment
New Auto-Interp
Negative Logits
whims
0.42
حب
0.41
hypnot
0.40
ⵕ
0.39
どんどん
0.39
वनस्पती
0.39
liquides
0.39
لت
0.38
taxpayer
0.38
ensee
0.38
POSITIVE LOGITS
precisely
0.46
aligns
0.45
reputed
0.45
reputation
0.44
reput
0.43
legendary
0.42
joining
0.42
만큼
0.41
admired
0.40
compatible
0.40
Activations Density 0.033%