INDEX
Explanations
horizontal separators or dividers in textual data
New Auto-Interp
Negative Logits
myſelf
-0.75
Inſ
-0.75
pleaſure
-0.75
Perſ
-0.65
Monfieur
-0.65
feroit
-0.64
greateſt
-0.64
queſta
-0.63
Houſe
-0.62
Anſ
-0.61
POSITIVE LOGITS
---
1.40
---
1.10
***
1.04
---
0.98
___
0.89
//---
0.79
***
0.76
.---
0.76
"---
0.68
("---0.67
Activations Density 0.219%