INDEX
Explanations
New Auto-Interp
Negative Logits
pleaſure
-0.96
^(@)
-0.96
+#+#
-0.93
greateſt
-0.86
#+#
-0.82
Efq
-0.82
ſeveral
-0.81
itſelf
-0.79
houſe
-0.77
providedIn
-0.76
POSITIVE LOGITS
<bos>
1.44
'
0.48
jan
0.47
the
0.46
try
0.45
di
0.43
ΙΑ
0.42
mopol
0.42
Letter
0.42
plan
0.42
Activations Density 0.879%