INDEX
Explanations
terms related to decision-making and accountability
New Auto-Interp
Negative Logits
peindre
-1.02
défendre
-0.84
boire
-0.76
remercier
-0.76
joindre
-0.75
courir
-0.74
croire
-0.74
mourir
-0.74
combattre
-0.73
فريبيس
-0.72
POSITIVE LOGITS
confirmer
0.62
changer
0.55
studier
0.55
gifter
0.54
hancer
0.53
isher
0.53
Carver
0.53
stretcher
0.53
asker
0.53
toucher
0.52
Activations Density 0.849%