INDEX
Explanations
mentions of prominent individuals or notable figures
references to notable individuals, particularly in a contextual or relational manner
New Auto-Interp
Negative Logits
actionDate
-0.78
»Ĵ
-0.75
atform
-0.74
Enlarge
-0.69
manent
-0.69
sequent
-0.68
incial
-0.65
obos
-0.63
etheless
-0.61
farious
-0.61
POSITIVE LOGITS
hates
1.63
deserves
1.54
loves
1.48
ain
1.44
sucks
1.43
wants
1.39
knows
1.37
owes
1.30
thinks
1.30
shouldn
1.30
Activations Density 0.369%