INDEX
Explanations
references to the concept of being a proxy for something else
references to "proxy" and related concepts
New Auto-Interp
Negative Logits
ynski
-0.93
alos
-0.84
ardy
-0.83
ership
-0.81
nen
-0.80
ivism
-0.80
ews
-0.79
ymes
-0.79
ŃĶ
-0.78
opathy
-0.76
POSITIVE LOGITS
proxy
0.95
proxies
0.83
Tempest
0.79
wars
0.71
landsl
0.70
sibling
0.69
warfare
0.67
versa
0.65
aggression
0.65
flare
0.64
Activations Density 0.027%