INDEX
Explanations
phrases involving citations or references to previously mentioned statements
New Auto-Interp
Negative Logits
ſte
-0.58
myſelf
-0.54
faſt
-0.54
ſever
-0.53
Anſ
-0.53
themſelves
-0.52
itſelf
-0.51
ſta
-0.50
againſt
-0.49
tranſ
-0.49
POSITIVE LOGITS
dicha
0.75
dichas
0.66
該
0.64
respective
0.64
该
0.62
该
0.62
данного
0.61
dichos
0.60
aforesaid
0.57
данный
0.55
Activations Density 0.020%