INDEX
Explanations
phrases related to logical reasoning and conclusions
logical conclusions or statements that assert consequences or implications
New Auto-Interp
Negative Logits
barely
-0.67
skirm
-0.65
buzz
-0.63
trickle
-0.63
just
-0.62
kids
-0.62
nearly
-0.61
stalking
-0.61
pops
-0.61
pop
-0.61
POSITIVE LOGITS
Therefore
3.27
Therefore
3.01
Consequently
2.61
Hence
2.38
Accordingly
2.32
Thus
2.10
therefore
1.96
Thus
1.92
Nevertheless
1.92
Furthermore
1.85
Activations Density 0.015%