INDEX
Explanations
phrases related to introductions and introductory language
New Auto-Interp
Negative Logits
ma
-0.72
na
-0.67
m
-0.65
mathsf
-0.64
Gü
-0.64
mo
-0.63
5
-0.62
Vegas
-0.61
iritual
-0.61
m
-0.60
POSITIVE LOGITS
Introduce
2.11
introduction
2.11
introductions
2.05
introduce
2.02
introduces
2.01
introducing
1.93
Introdu
1.92
introduce
1.92
Introduce
1.89
introdu
1.88
Activations Density 0.050%