INDEX
Explanations
people's names
occurrences of specific keywords and proper nouns
New Auto-Interp
Negative Logits
Matth
-0.83
NCT
-0.73
Assassin
-0.72
hei
-0.71
================
-0.70
Cumm
-0.70
ogly
-0.69
cuc
-0.69
Breat
-0.68
Cig
-0.68
POSITIVE LOGITS
urdue
0.95
ung
0.86
roy
0.81
ten
0.80
water
0.77
wald
0.76
orean
0.76
ACTED
0.76
bara
0.76
uned
0.76
Activations Density 0.427%