INDEX
Explanations
proper nouns related to various individuals in different contexts
New Auto-Interp
Negative Logits
tumblr
-0.48
elsius
-0.44
Generations
-0.41
torch
-0.40
rupture
-0.39
caps
-0.38
ilogy
-0.38
FORMATION
-0.38
ategory
-0.38
ATURE
-0.38
POSITIVE LOGITS
berto
0.54
Olivier
0.51
jen
0.51
ko
0.50
ijn
0.49
Hend
0.46
Koen
0.46
Andre
0.46
jan
0.46
Stephan
0.45
Activations Density 3.156%