INDEX
Explanations
variations of punctuation and formatting in text
New Auto-Interp
Negative Logits
bootstrapcdn
-0.77
nawr
-0.59
nephe
-0.58
المعيارى
-0.57
nevertheless
-0.56
itſelf
-0.55
счита
-0.54
philosop
-0.53
oprot
-0.53
يكب
-0.53
POSITIVE LOGITS
<bos>
0.90
')}}">
0.87
__(/*!
0.76
]").
0.74
"]);
0.74
"]));
0.72
]]:
0.72
//
0.72
})$}
0.71
“
0.71
Activations Density 0.048%