INDEX
Explanations
themes related to identity and self-reflection
New Auto-Interp
Head Attr Weights
0:0.20
1:0.08
2:0.04
3:0.08
4:0.03
5:0.07
6:0.03
7:0.02
8:0.06
9:0.09
10:0.05
11:0.19
Negative Logits
introductory
-1.53
proactive
-1.47
inois
-1.46
administr
-1.46
listings
-1.44
listing
-1.44
structured
-1.38
listed
-1.37
standalone
-1.36
undergraduate
-1.36
POSITIVE LOGITS
Already
1.69
Jew
1.66
Send
1.61
Reply
1.59
speech
1.52
Gh
1.51
Everyone
1.49
Mut
1.49
Almost
1.48
Jews
1.47
Activations Density 0.018%