INDEX
Explanations
the word "units" in diverse contexts
New Auto-Interp
Negative Logits
houſe
-0.76
Nacionales
-0.74
Cæsar
-0.73
pleaſure
-0.71
i
-0.71
dieux
-0.71
noastre
-0.70
Jefus
-0.70
Houſe
-0.70
Monfieur
-0.69
POSITIVE LOGITS
')]
1.05
')],
0.98
']],
0.98
'))
0.94
')}
0.91
']}
0.87
'],
0.86
')")
0.86
?')
0.84
'),
0.82
Activations Density 1.117%