INDEX
Explanations
the word "steel" at different levels of relevance, with some activations indicating a very strong match
references to steel
New Auto-Interp
Negative Logits
Kard
-0.76
Niet
-0.74
Garr
-0.73
romeda
-0.72
DOE
-0.69
[|
-0.67
itia
-0.66
Chomsky
-0.64
uate
-0.64
annah
-0.63
POSITIVE LOGITS
works
1.06
wool
1.05
Series
1.03
workers
1.00
worker
0.96
steel
0.94
anguage
0.91
fish
0.91
beams
0.87
Steel
0.87
Activations Density 0.020%