INDEX
Explanations
phrases related to specific nouns or names, often starting with a capital letter
references to specific names and terms associated with notable movies or characters
New Auto-Interp
Negative Logits
âĸ¬
-0.87
indo
-0.82
hetic
-0.66
iments
-0.66
partName
-0.65
FTWARE
-0.65
âĸ¬âĸ¬
-0.65
hetically
-0.64
iment
-0.63
eco
-0.62
POSITIVE LOGITS
shots
0.81
iewicz
0.80
Leaks
0.73
antha
0.71
roots
0.71
lord
0.71
shot
0.65
urrection
0.64
Wick
0.63
steps
0.63
Activations Density 0.052%