INDEX
Explanations
references to individuals and their actions in various contexts
New Auto-Interp
Negative Logits
fun
-0.51
y
-0.46
is
-0.46
I
-0.45
Y
-0.44
Fun
-0.42
(
-0.42
buie
-0.42
+
-0.42
ابر
-0.40
POSITIVE LOGITS
ftagPool
0.87
$_"
0.83
ſhe
0.80
Efq
0.78
itſelf
0.77
ScopeManager
0.76
doubtnut
0.75
Majefty
0.74
saraba
0.74
becauſe
0.73
Activations Density 0.345%