INDEX
Explanations
phrases referring to a specific subject or topic
instances of the word "this."
New Auto-Interp
Negative Logits
ãģ®ç
-0.60
ettes
-0.59
rang
-0.59
rider
-0.56
jon
-0.56
iae
-0.56
lia
-0.56
uster
-0.54
ylum
-0.53
eker
-0.52
POSITIVE LOGITS
this
2.34
this
1.81
THIS
1.71
these
1.65
these
1.28
THIS
1.19
THESE
1.13
tonight
1.09
This
1.07
This
1.01
Activations Density 0.217%