INDEX
Explanations
making official declarations
New Auto-Interp
Negative Logits
hành
0.43
表现
0.42
persuasion
0.42
menggambarkan
0.40
குறிப்பி
0.40
表現
0.39
roug
0.39
Beschreibung
0.39
okre
0.38
prototyping
0.38
POSITIVE LOGITS
allegiance
0.82
loudly
0.70
declare
0.68
intentions
0.63
aloud
0.59
readiness
0.59
宣言
0.59
declare
0.58
candidacy
0.58
declarations
0.58
Activations Density 0.009%