INDEX
Explanations
expressions of self-identity and personal characteristics
New Auto-Interp
Negative Logits
desconhe
-0.61
neceffary
-0.58
Jefus
-0.57
ResponseWriter
-0.57
preſent
-0.57
subordinated
-0.56
unknowns
-0.55
myſelf
-0.55
hostilities
-0.52
Mongols
-0.52
POSITIVE LOGITS
notoriously
0.92
notorious
0.89
brukar
0.73
prone
0.73
famously
0.73
known
0.72
always
0.71
usually
0.71
often
0.70
obsessive
0.69
Activations Density 0.232%