INDEX
Explanations
words containing the string "wan" with varying activations
the name "Marwan" appearing multiple times in relation to various contexts
New Auto-Interp
Negative Logits
predictive
-0.72
leaflets
-0.71
£ı
-0.69
ttle
-0.69
clustered
-0.68
igslist
-0.65
liability
-0.65
practicable
-0.65
Gemini
-0.64
İĭ
-0.63
POSITIVE LOGITS
wan
1.24
athan
1.08
igan
1.06
igans
1.05
wu
1.04
ovan
0.99
hao
0.98
nee
0.97
awan
0.97
Kenobi
0.95
Activations Density 0.007%