INDEX
Explanations
references to data models
New Auto-Interp
Negative Logits
houſe
-0.94
Anſ
-0.90
Monfieur
-0.87
Shakspeare
-0.86
Houſe
-0.86
Theſe
-0.84
Efq
-0.82
Shaksp
-0.81
UserScript
-0.81
Jefus
-0.80
POSITIVE LOGITS
Models
1.93
models
1.90
Model
1.86
Models
1.84
Model
1.84
models
1.65
MODEL
1.62
model
1.61
MODELS
1.60
MODEL
1.49
Activations Density 0.134%