INDEX
Explanations
queries or reflective thoughts
reflections and self-directed questions
New Auto-Interp
Negative Logits
pour
-0.78
nor
-0.74
nic
-0.73
udi
-0.68
von
-0.67
edia
-0.63
Sale
-0.63
Klu
-0.63
ga
-0.63
resa
-0.63
POSITIVE LOGITS
é¾įå¥ij士
0.89
çīĪ
0.84
ħĭ
0.80
omething
0.79
subconscious
0.74
terday
0.73
puzzled
0.72
mate
0.71
anew
0.71
afterward
0.70
Activations Density 0.028%