INDEX
Explanations
descriptive words after 'and'
New Auto-Interp
Negative Logits
istoitu
-0.69
<unused43>
-0.66
<unused14>
-0.66
<unused8>
-0.66
<unused28>
-0.66
<unused23>
-0.66
[@BOS@]
-0.66
<unused68>
-0.66
<unused51>
-0.66
<pad>
-0.66
POSITIVE LOGITS
0.42
bright
0.40
său
0.36
“
0.34
”
0.33
<strong>
0.33
bright
0.33
cool
0.32
&
0.32
...
0.31
Activations Density 0.048%