INDEX
Explanations
expressions of knowledge and awareness
New Auto-Interp
Negative Logits
esgue
-0.78
AccessorTable
-0.64
lrrrr
-0.61
vuitton
-0.59
reflections
-0.58
printStackTrace
-0.57
Reflections
-0.57
Prag
-0.56
hithe
-0.56
CRITICAL
-0.56
POSITIVE LOGITS
know
1.86
know
1.82
knows
1.81
Know
1.74
Know
1.69
knows
1.64
KNOW
1.60
KNOW
1.55
knowing
1.50
knew
1.49
Activations Density 0.260%