INDEX
Explanations
negations and conditions in statements
New Auto-Interp
Negative Logits
.
-0.57
.
-0.55
"
-0.51
...
-0.50
inoltre
-0.48
UnusedPrivate
-0.48
fsp
-0.47
?
-0.47
:
-0.47
donc
-0.46
POSITIVE LOGITS
itſelf
1.02
myſelf
0.97
$_"
0.91
fevere
0.85
technically
0.85
{},
0.84
ſmall
0.83
Reſ
0.83
fubject
0.83
houſe
0.83
Activations Density 0.319%