INDEX
Explanations
phrases that indicate evidence or inferences based on observations or results
New Auto-Interp
Negative Logits
AndEndTag
-0.68
fubject
-0.60
Cæsar
-0.55
Carthage
-0.54
hoor
-0.51
astanza
-0.51
poland
-0.51
purpoſe
-0.50
ſtate
-0.50
Eury
-0.50
POSITIVE LOGITS
thus
0.66
thereby
0.63
незавершена
0.58
hence
0.58
Билгалдахарш
0.57
Šaltiniai
0.57
somit
0.56
Thus
0.56
новниш
0.56
tagHelperRunner
0.55
Activations Density 0.433%