INDEX
Explanations
topics of discussion or conversation
references to various subjects or themes being discussed
New Auto-Interp
Negative Logits
rush
-0.66
ANY
-0.61
Rouge
-0.61
uin
-0.60
berto
-0.60
rection
-0.60
claw
-0.60
rip
-0.59
NEY
-0.58
pex
-0.58
POSITIVE LOGITS
topics
3.64
Topics
2.52
topic
2.36
subjects
1.97
themes
1.86
Topics
1.75
topic
1.68
Topic
1.62
Topic
1.59
questions
1.58
Activations Density 0.017%