INDEX
Explanations
references to camaraderie or companionship
New Auto-Interp
Negative Logits
Annette
-0.76
Annette
-0.73
pwr
-0.69
tarvit
-0.68
něko
-0.66
stalline
-0.65
ranton
-0.65
Kanz
-0.65
creas
-0.64
ricanes
-0.64
POSITIVE LOGITS
Fellows
1.30
Fellow
1.23
Fellow
1.21
FELLOW
1.20
Fellowship
1.00
Fellowships
0.99
fellow
0.95
fellow
0.90
fellows
0.88
fellowship
0.84
Activations Density 0.005%