INDEX
Explanations
phrases that indicate individual or collective participation in processes or systems
New Auto-Interp
Negative Logits
sr
-0.57
<eos>
-0.56
.
-0.56
ilim
-0.55
“
-0.54
/
-0.54
↵↵
-0.52
li
-0.52
-0.52
ö
-0.52
POSITIVE LOGITS
itſelf
1.49
each
1.46
EACH
1.41
Chaque
1.39
each
1.38
Chaque
1.36
Each
1.36
Each
1.36
EACH
1.35
Ogni
1.35
Activations Density 0.354%