INDEX
Explanations
references to the act of finding or discovering something
New Auto-Interp
Negative Logits
zien
-0.59
ⓧ
-0.57
observed
-0.55
querque
-0.54
Known
-0.52
Known
-0.51
Against
-0.51
khana
-0.51
ahn
-0.50
Observed
-0.49
POSITIVE LOGITS
anywhere
1.14
somewhere
1.01
everywhere
0.89
anywhere
0.88
elsewhere
0.85
someplace
0.83
nowhere
0.80
somewhere
0.76
Anywhere
0.76
everywhere
0.75
Activations Density 0.246%