INDEX
Explanations
occurrences of the word "our"
occurrences of the word "our"
New Auto-Interp
Negative Logits
revert
-0.64
minster
-0.63
buck
-0.62
FU
-0.61
Tanz
-0.59
citation
-0.58
Rasmussen
-0.57
stem
-0.57
mesh
-0.57
rusher
-0.56
POSITIVE LOGITS
our
1.25
selves
1.24
neau
1.01
neys
1.00
ours
0.94
¯¯¯¯¯¯¯¯
0.89
OUR
0.86
izont
0.86
ouring
0.85
oux
0.81
Activations Density 0.010%