INDEX
Explanations
phrases indicating surprise or unexpected outcomes
New Auto-Interp
Negative Logits
PeEnEo
-0.62
akka
-0.55
<bos>
-0.54
thâu
-0.53
OFDb
-0.52
hoppas
-0.51
EconPapers
-0.49
wery
-0.49
Kontrola
-0.48
anderen
-0.48
POSITIVE LOGITS
unsur
0.73
الدراسه
0.58
defStyle
0.58
inevitable
0.57
inevitably
0.56
Risultati
0.55
vodu
0.53
contentLoaded
0.52
anthrene
0.52
"}";
0.51
Activations Density 0.234%