INDEX
Explanations
phrases representing different categories or options
phrases related to conditional statements or options
New Auto-Interp
Negative Logits
Loading
-0.71
cv
-0.57
HCR
-0.56
ACT
-0.55
Sy
-0.54
Build
-0.54
RAW
-0.53
SA
-0.53
ibr
-0.53
expected
-0.53
POSITIVE LOGITS
one
1.69
one
1.56
ONE
1.35
two
1.32
two
1.32
One
1.27
TWO
1.25
One
1.14
another
1.14
Two
1.11
Activations Density 0.168%