INDEX
Explanations
identifying unusual or specific things
New Auto-Interp
Negative Logits
stories
0.48
torque
0.48
amigo
0.48
tic
0.46
weight
0.45
یط
0.45
Ridd
0.45
٘
0.45
Fors
0.44
praise
0.44
POSITIVE LOGITS
επα
0.46
čk
0.45
Β
0.43
δε
0.43
ե
0.43
Ανα
0.42
ανα
0.42
扆
0.42
τελευτα
0.41
अनुमति
0.41
Activations Density 0.002%