INDEX
Explanations
the presence of the word "et."
New Auto-Interp
Negative Logits
purpoſe
-0.77
deſt
-0.72
pleaſure
-0.71
perfons
-0.69
raiſ
-0.69
caufe
-0.69
cauſe
-0.68
Diſ
-0.68
WithIOException
-0.68
Eſ
-0.66
POSITIVE LOGITS
al
1.56
al
0.87
Al
0.81
AL
0.61
alia
0.60
et
0.57
Al
0.55
el
0.53
ai
0.52
AL
0.52
Activations Density 0.090%