INDEX
Explanations
specific strings like 'ON' or 'in' within a text
specific prepositions and their variations
New Auto-Interp
Negative Logits
Revelations
-0.74
Quadro
-0.68
ATTLE
-0.66
laughter
-0.64
ij士
-0.63
enance
-0.63
Penguins
-0.62
---------
-0.59
exits
-0.58
Buildings
-0.58
POSITIVE LOGITS
jin
1.04
ichi
0.98
ghai
0.92
kered
0.92
hiro
0.91
ji
0.91
ju
0.88
hao
0.88
ori
0.88
nan
0.87
Activations Density 0.095%