INDEX
Explanations
references to items or concepts indicated by demonstrative pronouns
New Auto-Interp
Negative Logits
Kleidung
-0.63
informaci
-0.61
zieży
-0.55
gouttes
-0.55
pouvoirs
-0.55
чик
-0.54
barnet
-0.53
गया
-0.52
gouvernements
-0.51
thư
-0.49
POSITIVE LOGITS
Theſe
1.11
NameInMap
0.99
theses
0.98
autorytatywna
0.96
Theses
0.94
للاسماء
0.94
these
0.93
Những
0.91
These
0.90
%")
0.88
Activations Density 0.157%