INDEX
Explanations
words related to demonstration or evidence of capability
New Auto-Interp
Negative Logits
pData
-0.17
omo
-0.17
eler
-0.16
ابت
-0.16
omi
-0.15
ponsive
-0.15
amous
-0.14
iff
-0.14
itet
-0.14
inox
-0.14
POSITIVE LOGITS
stration
0.34
strate
0.28
/demo
0.19
ikan
0.17
strar
0.17
iah
0.16
419
0.16
principle
0.15
Bread
0.15
ait
0.15
Activations Density 0.024%