INDEX
Explanations
instances of the word "it" in varying contexts
New Auto-Interp
Negative Logits
ishly
-0.17
haft
-0.15
ï
-0.15
ulty
-0.14
odge
-0.14
cout
-0.14
sic
-0.14
ulture
-0.14
iston
-0.14
hana
-0.13
POSITIVE LOGITS
iner
0.41
chy
0.31
/her
0.29
/th
0.28
zelf
0.27
/us
0.26
unes
0.26
self
0.23
inerary
0.23
SELF
0.23
Activations Density 0.184%