INDEX
Explanations
phrases indicating prompt actions or responses
New Auto-Interp
Negative Logits
ModelExpression
-0.90
vertes
-0.76
autorytatywna
-0.73
kuuta
-0.73
χρι
-0.68
reposer
-0.65
y
-0.65
Alan
-0.64
ๆ
-0.64
:[]
-0.63
POSITIVE LOGITS
Immediate
0.95
Immediate
0.95
immedi
0.94
immediate
0.93
immédi
0.90
IMMEDIATE
0.89
immediate
0.88
aneous
0.84
immediately
0.84
)|^{0.82
Activations Density 0.055%