INDEX
Explanations
proper nouns such as names of people, places, and organizations
references to notable people, places, or organizations
New Auto-Interp
Negative Logits
¿½
-0.99
Ezek
-0.73
Mehran
-0.61
EMP
-0.60
expansive
-0.60
Mell
-0.60
COMPLE
-0.59
nascent
-0.58
Naples
-0.58
asury
-0.58
POSITIVE LOGITS
sucks
1.30
ain
1.22
!!!
1.07
doesnt
1.06
?!
1.06
hates
1.05
!!!!
1.04
!!
0.97
!?
0.96
huh
0.96
Activations Density 0.646%