INDEX
Explanations
references to daily life experiences and activities
New Auto-Interp
Negative Logits
231
-0.16
uD
-0.15
å±ķ
-0.15
eldon
-0.15
ÑĪе
-0.14
/Base
-0.14
overs
-0.14
acam
-0.14
235
-0.14
correctness
-0.14
POSITIVE LOGITS
spent
0.17
ardon
0.16
yang
0.15
cation
0.15
angep
0.15
ÑĢождениÑı
0.15
aversable
0.14
воÑĤ
0.14
spent
0.14
aryl
0.13
Activations Density 0.082%