INDEX
Explanations
phrases that express skepticism or critical analysis
New Auto-Interp
Negative Logits
kå
-0.14
ANTE
-0.14
ocab
-0.14
éru
-0.14
åŀ
-0.14
ueva
-0.13
ieber
-0.13
ussen
-0.13
ante
-0.13
bab
-0.13
POSITIVE LOGITS
deeper
0.50
deep
0.44
æ·±
0.40
digging
0.40
depth
0.39
dig
0.39
deep
0.37
Dig
0.37
Dig
0.36
dig
0.35
Activations Density 0.265%