INDEX
Explanations
words related to physical entities or actions related to trenches
references to trenches
New Auto-Interp
Negative Logits
Ring
-0.79
visual
-0.77
ghai
-0.76
ivia
-0.74
isphere
-0.74
estate
-0.72
bah
-0.71
cius
-0.70
iesel
-0.69
isc
-0.68
POSITIVE LOGITS
anty
0.87
coat
0.83
Warfare
0.78
rower
0.76
warfare
0.75
Hare
0.71
frontier
0.70
ãĥ¤
0.69
auld
0.67
Colossus
0.65
Activations Density 0.050%