INDEX
Explanations
phrases in a specific language or encoding pattern
special characters or unusual symbols
New Auto-Interp
Negative Logits
best
-0.74
drawn
-0.74
geries
-0.73
gotten
-0.70
similarity
-0.69
simplest
-0.68
itars
-0.68
best
-0.67
lycer
-0.65
pert
-0.65
POSITIVE LOGITS
į
1.61
ÃįÃį
1.03
ķ
1.01
à¤
0.98
£
0.98
Į
0.95
Ģ
0.93
°
0.92
ÑĤ
0.92
Ĭ
0.92
Activations Density 0.007%