INDEX
Explanations
words related to specific targets or goals
instances of the word "target" and its related contexts
New Auto-Interp
Negative Logits
UGE
-0.68
fo
-0.68
Geological
-0.67
IGH
-0.67
ISTORY
-0.67
notor
-0.62
ERROR
-0.62
plet
-0.61
ansk
-0.61
AX
-0.60
POSITIVE LOGITS
ted
1.13
targets
0.97
target
0.92
nels
0.81
izen
0.80
oided
0.77
eers
0.76
ishi
0.75
target
0.73
ataka
0.72
Activations Density 0.016%