INDEX
Explanations
targets or entities that are being identified or aimed at
references to specific objectives or targets
New Auto-Interp
Negative Logits
OVA
-0.71
haz
-0.68
pole
-0.66
cup
-0.65
john
-0.65
htt
-0.65
alus
-0.63
uitive
-0.62
inct
-0.62
inus
-0.62
POSITIVE LOGITS
targets
4.09
target
2.72
target
2.09
Target
1.83
targeted
1.82
Targ
1.79
targeting
1.74
Target
1.70
targ
1.66
objectives
1.61
Activations Density 0.009%