INDEX
Explanations
words related to intentions or meanings
the word "mean" and its various forms, focusing on expressions of intent and significance
New Auto-Interp
Negative Logits
aqu
-0.72
Newsletter
-0.69
icht
-0.65
@#&
-0.65
dfx
-0.63
ttes
-0.62
Sham
-0.61
Frazier
-0.60
ngth
-0.59
anon
-0.58
POSITIVE LOGITS
spirited
0.86
goodbye
0.84
lessness
0.75
nothing
0.73
something
0.73
INESS
0.72
ãĥĥãĤ¯
0.72
exactly
0.71
erella
0.71
bye
0.70
Activations Density 0.050%