INDEX
Explanations
mentions of individuals in roles related to authority, such as officials and spokespersons
New Auto-Interp
Negative Logits
transfieras
-0.67
protoimpl
-0.61
surla
-0.61
resourceCulture
-0.60
createSlice
-0.59
Италијани
-0.59
مشين
-0.58
-0.58
ddelweddau
-0.57
transQ
-0.56
POSITIVE LOGITS
said
0.55
explained
0.44
explains
0.41
says
0.41
وقال
0.41
said
0.39
trem
0.38
StringLen
0.35
told
0.35
sti
0.34
Activations Density 0.154%