INDEX
Explanations
phrases indicating certainty or belief
statements indicating certainty or significance regarding ideas and events
New Auto-Interp
Negative Logits
Breed
-0.74
Outside
-0.62
iates
-0.61
Gore
-0.59
tains
-0.59
Umb
-0.59
Herm
-0.58
Trap
-0.57
catch
-0.56
Chao
-0.56
POSITIVE LOGITS
ãĥĩãĤ£
0.74
ngth
0.72
å¦
0.70
alde
0.69
bsite
0.68
$.
0.67
Ĥİ
0.65
ãĥīãĥ©
0.65
æ©
0.65
ãĥ´ãĤ¡
0.65
Activations Density 0.782%