INDEX
Explanations
phrases that indicate work, effort, or complex interactions within narratives
New Auto-Interp
Negative Logits
repid
-0.14
ÙĩÙħÚĨÙĨÛĮÙĨ
-0.14
звиÑĩай
-0.14
بÙĪØ§Ø³Ø·Ø©
-0.13
ÅĻes
-0.13
ÙĤÙĩ
-0.13
-awesome
-0.12
ismet
-0.12
ÙĪØ§Ø±
-0.12
оÑĢаÑı
-0.12
POSITIVE LOGITS
too
1.25
too
1.09
Too
1.03
TOO
1.02
Too
0.99
太
0.91
-too
0.88
ÑģлиÑĪком
0.80
demasi
0.80
太
0.75
Activations Density 0.659%