INDEX
Explanations
recommendations and suggestions regarding actions or choices
New Auto-Interp
Negative Logits
indi
-0.16
indy
-0.15
isc
-0.14
nor
-0.14
rome
-0.14
ingular
-0.14
ÙĪØ§ÙĦتÙĬ
-0.14
ampion
-0.14
inf
-0.13
inement
-0.13
POSITIVE LOGITS
åIJ§
0.37
yourself
0.24
nhé
0.23
lah
0.20
yourselves
0.19
ìĦ¸ìļĶ
0.18
lah
0.18
accordingly
0.17
íķĺìĦ¸ìļĶ
0.17
ä½łçļĦ
0.17
Activations Density 0.398%