INDEX
Explanations
details related to finances, politics, and military operations
New Auto-Interp
Negative Logits
ãĥį
-0.71
confir
-0.66
Orig
-0.64
é¾įå
-0.61
Klu
-0.58
ãĥ«
-0.58
ãĤ¨ãĥ«
-0.57
Pengu
-0.56
Ô
-0.55
APP
-0.54
POSITIVE LOGITS
etc
1.40
etc
1.00
ect
0.94
â̦)
0.88
,...
0.88
,
0.86
â̦
0.80
...)
0.76
â̦
0.74
blah
0.70
Activations Density 0.243%