INDEX
Explanations
words indicating approximation or estimation
New Auto-Interp
Negative Logits
oli
-0.68
ieu
-0.68
era
-0.68
ves
-0.65
ters
-0.65
woods
-0.64
rive
-0.64
rift
-0.62
Rebels
-0.62
Express
-0.62
POSITIVE LOGITS
analogous
0.91
820
0.86
WATCHED
0.85
800
0.84
9000
0.84
200
0.83
700
0.81
Ĥİ
0.81
Ń·
0.81
equivalent
0.81
Activations Density 0.031%