INDEX
Explanations
let me introduce an action or explanation
New Auto-Interp
Negative Logits
ederal
-0.90
negozio
-0.89
Synd
-0.88
ragazzo
-0.88
selam
-0.88
konfig
-0.88
maksimum
-0.87
ystema
-0.86
érées
-0.86
ïc
-0.85
POSITIVE LOGITS
be
1.58
know
1.45
tell
1.32
help
1.09
first
1.08
explain
1.08
ราบ
1.05
please
1.01
dieran
0.96
сейчас
0.96
Activations Density 0.012%