INDEX
Explanations
occurrences of the word "in."
New Auto-Interp
Negative Logits
671
-0.17
vert
-0.15
ãĥ¥
-0.15
ivel
-0.14
iling
-0.14
anto
-0.14
bat
-0.14
aha
-0.14
etz
-0.14
elt
-0.13
POSITIVE LOGITS
tow
0.31
sight
0.30
play
0.25
reach
0.23
hand
0.21
Sight
0.21
plain
0.19
Evidence
0.18
ighted
0.18
evidence
0.18
Activations Density 0.161%