INDEX
Explanations
concepts related to pride and identity
New Auto-Interp
Negative Logits
,
-0.46
1
-0.45
.
-0.42
in
-0.40
-0.39
(
-0.39
or
-0.39
with
-0.39
div
-0.39
of
-0.38
POSITIVE LOGITS
فريبيس
1.16
snippetHide
1.10
tagHelperRunner
1.07
<unused79>
1.06
<unused74>
1.06
<unused52>
1.06
<unused14>
1.06
<unused8>
1.05
<unused3>
1.05
[@BOS@]
1.05
Activations Density 0.264%