INDEX
Explanations
phrases indicating strong personal opinions or convictions
New Auto-Interp
Negative Logits
цездатний
-0.78
Autoritní
-0.74
出版年
-0.70
ſammen
-0.63
IntoConstraints
-0.62
تانيه
-0.61
السكان
-0.60
Италијани
-0.59
Personensuche
-0.59
ViewFeatures
-0.58
POSITIVE LOGITS
“
0.52
The
0.42
Is
0.37
“
0.37
“…
0.37
'
0.36
’
0.36
<bos>
0.36
For
0.36
"
0.35
Activations Density 0.190%