INDEX
Explanations
phrases related to self-reflection and realization
expressions of realization or self-awareness
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.68
recent
-0.65
earlier
-0.64
meanwhile
-0.63
recently
-0.62
stemming
-0.61
Lys
-0.59
Recent
-0.58
lately
-0.58
misunderstand
-0.57
POSITIVE LOGITS
indistinguishable
0.90
apsed
0.83
suddenly
0.70
wered
0.68
unthinkable
0.68
Suddenly
0.66
Reviewer
0.66
uddenly
0.63
sed
0.63
amorph
0.62
Activations Density 0.933%