INDEX
Explanations
possessive + noun for state or action
New Auto-Interp
Negative Logits
is
0.22
becomes
0.22
fluctuates
0.22
<0xB1>
0.21
emits
0.21
wordt
0.21
ི
0.21
είναι
0.21
sputtered
0.21
osexuality
0.20
POSITIVE LOGITS
penchant
0.27
realist
0.24
многочис
0.24
austere
0.24
eclectic
0.23
insistence
0.23
insgesamt
0.23
longstanding
0.23
overarching
0.23
conviction
0.23
Activations Density 0.458%