INDEX
Explanations
content related to news articles or reports
segments where self-reflection or personal experiences are discussed
New Auto-Interp
Negative Logits
blinded
-0.75
undermin
-0.73
engagements
-0.71
footing
-0.70
wiser
-0.69
proport
-0.68
capit
-0.68
everal
-0.65
composing
-0.65
endeavour
-0.65
POSITIVE LOGITS
AMY
0.97
MJ
0.92
âĹ¼
0.88
³³³³³³³³
0.83
POST
0.82
laughs
0.80
NER
0.80
Interview
0.80
Anyway
0.78
JC
0.78
Activations Density 0.281%