INDEX
Explanations
the word "personally" and "filepath"
personally
New Auto-Interp
Negative Logits
.
-0.92
(
-0.82
,
-0.71
-0.69
(
-0.68
-0.66
y
-0.65
-
-0.64
:
-0.63
-
-0.62
POSITIVE LOGITS
myſelf
1.52
Efq
1.47
itſelf
1.45
Monfieur
1.39
Jefus
1.38
$_"
1.38
purpoſe
1.38
ſelves
1.37
ſelf
1.37
"]);
1.35
Activations Density 0.792%