INDEX
Explanations
phrases indicating variability or frequency in different contexts
New Auto-Interp
Negative Logits
isObject
-0.15
334
-0.15
restraint
-0.14
:↵↵↵↵
-0.14
ir
-0.14
ÙĪØº
-0.14
ahl
-0.13
umer
-0.13
yne
-0.13
Ī
-0.13
POSITIVE LOGITS
orious
0.15
ANCE
0.15
ussen
0.14
ladu
0.14
inton
0.14
ISTER
0.14
#error
0.14
Ŀ
0.14
cases
0.14
μία
0.14
Activations Density 0.073%