INDEX
Explanations
abbreviations or acronyms typically containing the letters 'k', 'a', and numbers
phrases indicating characters or story elements from various narratives
New Auto-Interp
Negative Logits
ighters
-0.75
rition
-0.70
iscal
-0.67
letcher
-0.65
annabin
-0.65
pson
-0.65
encers
-0.63
cig
-0.62
aux
-0.61
urrent
-0.61
POSITIVE LOGITS
theirs
0.80
=]
0.71
NAME
0.70
BILITY
0.70
hers
0.70
«ĺ
0.68
hler
0.66
ours
0.66
Dame
0.65
Dating
0.64
Activations Density 0.060%