INDEX
Explanations
references to personal choices and decision-making
New Auto-Interp
Negative Logits
Peralta
-0.76
îna
-0.74
Tembelea
-0.73
rispar
-0.73
appellants
-0.73
dumne
-0.72
Dammit
-0.71
episódios
-0.71
παιδιά
-0.70
menina
-0.69
POSITIVE LOGITS
Choices
1.71
choices
1.70
Choice
1.60
CHOICE
1.57
choice
1.57
Choices
1.50
choices
1.46
Choice
1.44
CHOICE
1.41
choice
1.40
Activations Density 0.055%