INDEX
Explanations
contractions of "get it" with high activation values
instances of the word "it"
New Auto-Interp
Negative Logits
notice
-0.72
Mans
-0.71
ILE
-0.65
ãĤ±
-0.64
ãĥ´ãĤ¡
-0.64
911
-0.63
767
-0.59
VA
-0.59
Friend
-0.59
762
-0.58
POSITIVE LOGITS
chy
1.23
alian
0.86
unes
0.86
iner
0.81
atic
0.69
backwards
0.67
atically
0.67
geist
0.67
asca
0.67
ueller
0.66
Activations Density 0.079%