INDEX
Explanations
personal experiences or actions described in the first person
occurrences of the pronoun "I" and related personal references
New Auto-Interp
Negative Logits
loops
-0.69
decoration
-0.68
Haku
-0.66
dies
-0.63
2048
-0.62
INGTON
-0.62
stagnation
-0.61
Ezekiel
-0.59
Ĥİ
-0.59
reversible
-0.58
POSITIVE LOGITS
ldon
0.99
xtap
0.92
ñ
0.87
cade
0.80
emen
0.80
adr
0.80
opia
0.79
aci
0.79
arag
0.79
atri
0.78
Activations Density 0.322%