INDEX
Explanations
phrases indicating recognition or high regard for subjects in various contexts
New Auto-Interp
Negative Logits
insky
-0.17
nell
-0.14
Myers
-0.14
mer
-0.14
/github
-0.13
ester
-0.13
abl
-0.13
Muk
-0.13
αι
-0.13
weakest
-0.13
POSITIVE LOGITS
igel
0.15
amongst
0.15
among
0.15
Incre
0.15
ATUS
0.14
Andreas
0.14
Among
0.14
ndata
0.14
ims
0.14
atk
0.14
Activations Density 0.203%