INDEX
Explanations
real or simulated experiences
New Auto-Interp
Negative Logits
strongly
0.46
sufficiently
0.45
reliably
0.45
consistently
0.44
purely
0.44
Strongly
0.44
terrestre
0.42
repeatedly
0.40
ait
0.40
lal
0.39
POSITIVE LOGITS
simulated
0.94
Simulated
0.83
模擬
0.80
mock
0.78
真实的
0.77
実際の
0.71
настоя
0.68
実際に
0.68
실제
0.66
模拟
0.66
Activations Density 0.030%