INDEX
Explanations
numerical values within text
references to specific names or identifiers, particularly in a context of location or cultural identity
New Auto-Interp
Negative Logits
ongs
-0.93
reen
-0.80
arios
-0.78
ores
-0.77
oing
-0.76
oshenko
-0.76
uren
-0.74
cius
-0.73
ists
-0.73
ullivan
-0.73
POSITIVE LOGITS
âĢİ
0.98
Fallen
0.82
âĢİ
0.78
ICAN
0.77
Barcl
0.74
lot
0.67
tallest
0.66
dent
0.66
STEM
0.65
":["
0.65
Activations Density 0.024%