INDEX
Explanations
contexts involving identification and interpretation of entities or phenomena
New Auto-Interp
Negative Logits
rin
-0.18
FU
-0.15
ีล
-0.15
ILE
-0.14
VARIABLES
-0.14
ofday
-0.14
طر
-0.13
HORT
-0.13
олÑĥÑĩ
-0.13
rove
-0.13
POSITIVE LOGITS
odox
0.15
ój
0.14
Concat
0.14
riere
0.14
Greene
0.14
anda
0.14
ellig
0.13
áž
0.13
.yy
0.13
catholic
0.13
Activations Density 0.292%