INDEX
Explanations
informal phrases and conversational opinions about various topics
New Auto-Interp
Negative Logits
.
-0.56
W
-0.55
↵
-0.54
w
-0.45
-0.45
?.
-0.42
);
-0.42
\
-0.41
");
-0.41
\]
-0.40
POSITIVE LOGITS
myſelf
1.02
Мексичка
0.99
―――――
0.98
houſe
0.91
ſmall
0.91
ſche
0.89
leſs
0.89
raiſ
0.89
pleaſure
0.88
Савезне
0.86
Activations Density 0.220%