INDEX
Explanations
expressions of honesty or truthfulness
New Auto-Interp
Negative Logits
docx
-0.60
Попис
-0.53
homonymie
-0.53
avajillas
-0.52
mea
-0.52
Ma
-0.50
mul
-0.50
Ma
-0.50
成了
-0.49
ordine
-0.49
POSITIVE LOGITS
\{\\0.94
practically
0.90
basically
0.76
myſelf
0.76
__":
0.75
EClass
0.74
contextLoads
0.73
assertTrue
0.73
Basically
0.72
Honestly
0.72
Activations Density 0.076%