INDEX
Explanations
elements related to strategies, evaluations, or categories
New Auto-Interp
Negative Logits
resourceCulture
-0.54
للمعارف
-0.52
geçir
-0.44
DS
-0.44
HasAnnotation
-0.43
CloseOperation
-0.42
thích
-0.40
and
-0.40
限
-0.39
Ri
-0.38
POSITIVE LOGITS
alſo
1.02
alfo
0.90
himſelf
0.89
ſtate
0.87
Monfieur
0.85
themſelves
0.82
also
0.81
niająca
0.81
myſelf
0.79
fhort
0.76
Activations Density 1.578%