INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
speaker
-0.67
honour
-0.66
convict
-0.66
Wr
-0.65
Advocate
-0.64
Lump
-0.63
Ecc
-0.62
ï¸
-0.62
Ancient
-0.62
Sovere
-0.61
POSITIVE LOGITS
ahime
0.92
igree
0.89
ensen
0.86
aneers
0.82
yrinth
0.81
erella
0.80
yahoo
0.79
merce
0.79
imi
0.78
anya
0.78
Activations Density 0.000%
No Known Activations
This feature has no known activations.