INDEX
Explanations
phrases related to introductions and presenting people or concepts
New Auto-Interp
Negative Logits
ma
-0.70
رى
-0.66
na
-0.66
na
-0.63
ar
-0.63
5
-0.63
m
-0.63
mo
-0.62
pis
-0.61
m
-0.61
POSITIVE LOGITS
Introduce
1.71
introduces
1.61
introductions
1.58
Introduce
1.56
introduce
1.54
introduction
1.53
introduce
1.52
introdu
1.48
Introducing
1.47
introducing
1.47
Activations Density 0.066%