INDEX
Explanations
phrases indicating knowledge or information
the phrase "that" in various contexts
New Auto-Interp
Negative Logits
orah
-0.79
hens
-0.73
ãĥ¼ãĤ¯
-0.72
obb
-0.71
amia
-0.66
orian
-0.65
tails
-0.64
ield
-0.64
apolis
-0.62
estern
-0.61
POSITIVE LOGITS
pesky
1.02
cher
0.76
they
0.75
there
0.75
although
0.71
same
0.70
fateful
0.70
whereas
0.70
someday
0.68
THEY
0.67
Activations Density 0.203%