INDEX
Explanations
phrases that indicate specific conditions or circumstances when something occurs
when it comes to
New Auto-Interp
Negative Logits
purpoſe
-0.95
himſelf
-0.85
ſtate
-0.84
houſe
-0.80
ſame
-0.76
Jefus
-0.76
rodríguez
-0.73
auroit
-0.73
ſy
-0.72
Majefty
-0.71
POSITIVE LOGITS
when
1.09
WHEN
1.01
When
0.99
when
0.98
WHEN
0.93
When
0.91
they
0.88
we
0.84
när
0.82
cuando
0.78
Activations Density 0.102%