INDEX
Explanations
bootstrap and increased values
New Auto-Interp
Negative Logits
$-(
0.58
(-(
0.52
//$
0.51
-(
0.50
//}
0.49
-(
0.48
}$-(
0.41
(!(
0.40
allegedly
0.39
//{0.38
POSITIVE LOGITS
разум
0.44
Bootstrap
0.44
Thats
0.43
Thats
0.41
Increased
0.40
increased
0.40
bootstrap
0.39
Increased
0.39
bootstrap
0.39
Boot
0.38
Activations Density 0.001%