INDEX
Explanations
the string "Han" and related words
the name "Han" across various contexts
New Auto-Interp
Negative Logits
anwhile
-0.84
destro
-0.82
utics
-0.71
URES
-0.70
Downloadha
-0.70
ktop
-0.68
okemon
-0.68
ODUCT
-0.67
atoon
-0.67
terday
-0.67
POSITIVE LOGITS
auer
0.97
uman
0.95
ning
0.91
ifa
0.89
wei
0.88
lon
0.87
Solo
0.87
bang
0.86
wal
0.82
hart
0.81
Activations Density 0.008%