INDEX
Explanations
phrases related to casual conversation and social interaction
New Auto-Interp
Negative Logits
Additionally
-0.81
Furthermore
-0.81
Additionally
-0.75
Furthermore
-0.74
poiché
-0.67
Moreover
-0.67
additionally
-0.63
bigsqcup
-0.61
furthermore
-0.61
Moreover
-0.60
POSITIVE LOGITS
gotta
1.10
need
1.06
gonna
1.05
Gonna
1.02
dunno
1.00
got
0.99
gonna
0.97
Gonna
0.96
Need
0.95
Gotta
0.94
Activations Density 0.337%