INDEX
Explanations
mentions of a specific term "Cl" followed by a number
the symbol "<|endoftext|>" and the term "Cl 2"
New Auto-Interp
Negative Logits
Democr
-0.78
htt
-0.72
Lans
-0.66
ãĤ®
-0.65
ALS
-0.60
SPD
-0.59
Palestin
-0.59
den
-0.59
conclud
-0.59
revolving
-0.58
POSITIVE LOGITS
utch
1.36
oser
1.32
iffs
1.30
othes
1.29
ipper
1.27
ients
1.27
andestine
1.20
iff
1.19
usters
1.19
avier
1.18
Activations Density 0.015%