INDEX
Explanations
references to individuals or groups of people
New Auto-Interp
Negative Logits
Савезне
-1.02
myſelf
-0.81
Monfieur
-0.76
Vidite
-0.73
CreateTagHelper
-0.72
érience
-0.71
ainfi
-0.71
ſche
-0.70
himſelf
-0.70
Anſ
-0.68
POSITIVE LOGITS
它們
0.86
它们
0.80
its
0.73
它们的
0.72
it
0.72
their
0.70
它
0.64
它
0.61
їх
0.60
their
0.56
Activations Density 0.407%