INDEX
Explanations
proper names related to public figures and entities
references to specific individuals, particularly those named Diana
New Auto-Interp
Negative Logits
oos
-0.78
doors
-0.77
heimer
-0.74
oop
-0.73
fits
-0.72
oops
-0.72
ten
-0.70
heastern
-0.69
ears
-0.69
thren
-0.68
POSITIVE LOGITS
Diana
1.17
Wynne
0.84
ILY
0.79
racuse
0.75
Gab
0.75
Kali
0.73
Sachs
0.73
sacrific
0.71
Yor
0.71
eclipse
0.69
Activations Density 0.025%