INDEX
Explanations
anti- followed by warfare types
New Auto-Interp
Negative Logits
Keychain
0.43
perifer
0.41
츨
0.41
सेफ्टी
0.41
competitive
0.40
уйнау
0.40
clidean
0.39
slam
0.39
competitive
0.39
безпе
0.39
POSITIVE LOGITS
aircraft
0.63
Aircraft
0.61
Aircraft
0.60
aircraft
0.58
tank
0.55
Tank
0.54
submarine
0.54
ship
0.52
tank
0.52
Tank
0.50
Activations Density 0.002%