INDEX
Explanations
moments/events that are impactful or emotionally charged
references to significant moments or events
New Auto-Interp
Negative Logits
ä¸Ĭ
-0.80
ãĤ´
-0.73
GH
-0.72
bard
-0.67
ãģĮ
-0.67
tsky
-0.65
emale
-0.65
tti
-0.64
redistributed
-0.63
textbooks
-0.62
POSITIVE LOGITS
ous
1.25
ary
1.24
aries
1.13
aneously
1.11
ously
1.06
icity
0.90
ARY
0.90
arily
0.84
aneous
0.82
ues
0.81
Activations Density 0.025%