INDEX
Explanations
references to strangeness or unusualness in people or things
New Auto-Interp
Negative Logits
Munro
-0.79
Pott
-0.78
"..\..\..\
-0.76
Lott
-0.76
casquette
-0.75
↩
-0.73
metallo
-0.72
ctez
-0.71
AssemblyTitle
-0.71
parlant
-0.71
POSITIVE LOGITS
Stranger
0.91
stranger
0.90
Stranger
0.89
strangers
0.69
stranger
0.67
ordini
0.66
patrones
0.66
Strangers
0.66
Lange
0.64
Strange
0.63
Activations Density 0.004%