INDEX
Explanations
adverbs indicating approximation or universality
New Auto-Interp
Negative Logits
orem
-0.16
claimer
-0.15
ogan
-0.15
oton
-0.15
elif
-0.15
пÑĥ
-0.14
å±ħ
-0.14
ám
-0.14
amental
-0.14
arp
-0.14
POSITIVE LOGITS
exclusively
0.25
entirely
0.21
entire
0.20
identical
0.18
ëĮĢë¶Ģë¶Ħ
0.17
everyone
0.17
everything
0.17
complete
0.17
everybody
0.17
completely
0.16
Activations Density 0.078%