INDEX
Explanations
the word 'type' followed by a number, particularly 'type 9' and 'type 10'
phrases that reference different categories or classifications
New Auto-Interp
Negative Logits
romeda
-0.72
å§«
-0.67
olulu
-0.66
vernment
-0.65
ITED
-0.65
pload
-0.65
ernel
-0.64
borough
-0.64
Morning
-0.63
Liberties
-0.62
POSITIVE LOGITS
faces
1.21
face
1.16
etter
1.09
casting
0.98
etting
0.94
ahead
0.85
cast
0.77
alias
0.76
classes
0.75
geist
0.73
Activations Density 0.020%