INDEX
Explanations
phrases related to contributions or contradictions
New Auto-Interp
Negative Logits
Monk
-0.67
Reboot
-0.66
Palace
-0.65
ilee
-0.65
lyn
-0.65
bells
-0.64
vows
-0.63
MBA
-0.63
undergrad
-0.62
knee
-0.62
POSITIVE LOGITS
cont
3.98
Cont
2.63
Cont
1.79
CONT
1.66
CONT
1.40
dist
1.17
contact
1.17
cont
1.16
comp
1.16
controller
1.11
Activations Density 0.006%