INDEX
Explanations
references to the name "Joe" or variations of it
New Auto-Interp
Negative Logits
̈́
-0.71
rind
-0.66
')")
-0.65
)");
-0.63
Michael
-0.63
FontAwesome
-0.62
Michael
-0.61
"],
-0.60
'))
-0.60
LEncoder
-0.60
POSITIVE LOGITS
Joe
2.55
Joe
2.31
joe
2.11
JOE
2.04
JOE
1.77
Joseph
1.76
joe
1.75
Joseph
1.61
joseph
1.49
JOSEPH
1.39
Activations Density 0.039%