INDEX
Explanations
dialogue and statements made by individuals
New Auto-Interp
Negative Logits
fur
-0.15
ysa
-0.15
iaux
-0.15
iona
-0.14
wonder
-0.14
simplex
-0.14
usta
-0.14
sip
-0.14
нка
-0.14
catch
-0.13
POSITIVE LOGITS
"(
0.16
hti
0.16
quared
0.16
ÙĦا
0.15
oola
0.14
ulis
0.14
aro
0.14
ulen
0.14
"[
0.14
ãĥ¬ãĥ³
0.14
Activations Density 0.035%