INDEX
Explanations
the word "concept" within text
references to the concept of "concept" in various contexts
New Auto-Interp
Negative Logits
Reply
-0.63
ishops
-0.61
ificant
-0.61
deen
-0.61
bye
-0.61
Peninsula
-0.60
loads
-0.59
cano
-0.58
Silence
-0.58
udos
-0.58
POSITIVE LOGITS
ually
1.72
ual
1.26
ical
0.96
ional
0.95
uality
0.87
agram
0.83
ivist
0.82
iple
0.81
ically
0.81
ively
0.80
Activations Density 0.030%