INDEX
Explanations
positive adjectives or phrases that describe quality or standard levels
New Auto-Interp
Negative Logits
sten
-0.46
zy
-0.43
eliminated
-0.43
isner
-0.40
psy
-0.40
phase
-0.39
borough
-0.39
planners
-0.39
thur
-0.38
Abstract
-0.37
POSITIVE LOGITS
sized
0.81
enough
0.59
enough
0.58
decent
0.54
chunk
0.54
Enough
0.51
mble
0.50
tarian
0.50
ally
0.49
Enough
0.48
Activations Density 15.440%